Agenda:
- Eurostats review
- Workflows
- Discussion for next week
# Look at this nice way to load libraries without filling the screen with junk
suppressMessages(library(tidyverse))
Eurostats and other questions
library(eurostat)
data <- get_eurostat(id="cdh_e_diss", time_format = "num")
trying URL 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fcdh_e_diss.tsv.gz'
Content type 'application/octet-stream;charset=UTF-8' length 6007 bytes
==================================================
downloaded 6007 bytes
Table cdh_e_diss cached at /var/folders/40/g5h19fsj5yl8_tr49ny3jqx0h1wp05/T//RtmpdlsIXZ/eurostat/cdh_e_diss_num_code_TF.rds
data <- label_eurostat(data)
If RStudio won’t make a nice screen version of the table (happens on Windows computers), use the View()
function:
data %>% View()
Here’s our worked example derived from Marc’s homework:
I think the magic here is spread
: we need to have the 2006 and 2009 values on the same line so we can calculate whether the trend is increasing or decreasing. The geom_curve()
function draws a series of lines from (x, y) to (xend, yend); the curvature=0
argument makes the curve… straight. You can’t do this with geom_line()
, since geom_line draws one big line joining a vector of (x, y) values together.
data %>%
filter(sex == "Total", y_grad=="Total") %>%
spread(time, values) %>%
filter(!is.na(`2009`), !is.na(`2006`)) %>%
mutate(satisfaction=if_else(`2009`>`2006`, "Increasing", "Decreasing")) %>%
ggplot(aes(x=reason, y=`2006`, xend=reason, yend=`2009`, colour=satisfaction)) +
geom_curve(curvature=0, size=2) +
facet_wrap(~ geo) +
theme(axis.text.x=element_text(angle=90, hjust=1, size=7)) +
labs(y="Dissatisfaction level", x="Cause of dissatisfaction") +
ggtitle("Dissatisfaction of employed PhD students",
subtitle="Changes from 2006 to 2009")
Producing publication quality graphics.
The ggsave()
function makes a graphics file of the last image generated. Important arguments: - filename, path to specify where to put the file (the type of graphic is determined by the filename suffix, which should usually be .png
or .pdf
) - height, width, units (“in”, “mm”, “cm”)
ggsave("MichaelHappyPhD.pdf", width=20, height=10, units="cm")
Recoding category labels (i.e. “levels” of “factors”) and setting the order of the levels
Recode values:
countries <- c("AU", "AT", "SK", "SI")
recode(countries, AU="Australia", AT="Austria", SK="Slovakia", SI="Slovenia")
[1] "Australia" "Austria" "Slovakia" "Slovenia"
Turn a string into ordered factors:
countries <- c("Australia", "Austria", "Slovakia", "Slovenia")
factor(countries, levels=c("Slovakia", "Austria", "Slovenia", "Australia"))
[1] Australia Austria Slovakia Slovenia
Levels: Slovakia Austria Slovenia Australia
The Tidyverse’s forcats
package does all this very nicely too
Workflows
Techniques for collaboration.
- Git is state of the art for plain text-based collaboration
- Suitable for working with code and data (cf. CLDC materials)
http://rogerdudler.github.io/git-guide/
- git
- git via github
- De facto master copy
- Website can have an “Issue tracker” and Wiki
- A brilliant way to coordinate data compilation from lots of people
- github via RStudio
Other advantages
- Gives you versioned backups for free
- Interacts well with archiving/publishing tools
Archiving and publishing research analyses online (using e.g. FigShare).
How to report your analysis in a thesis or paper.
- Best practice in methods description
- Issues are different for visualisation
Other workflows (leaving RStudio for the text editor and command line)
- Notebook versus other kinds of markdown files
- R scripts
- Running R scripts through RStudio
- Writing R scripts with a text editor; running R scripts from the commandline
Dynamic visualisation
Final meeting
- What would you like to discuss?
- Final assignment: Produce a novel and interesting visualisation of data from your own PhD project. In the absence of suitable data you can use some other data related to your project. Present this in a short, compilable report in Markdown format (a Markdown document designed for printing, not a notebook). Due date: To be decided.
LS0tCnRpdGxlOiAiMDkgV29ya2Zsb3dzIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpBZ2VuZGE6CgotIEV1cm9zdGF0cyByZXZpZXcKLSBXb3JrZmxvd3MKLSBEaXNjdXNzaW9uIGZvciBuZXh0IHdlZWsKCmBgYHtyfQojIExvb2sgYXQgdGhpcyBuaWNlIHdheSB0byBsb2FkIGxpYnJhcmllcyB3aXRob3V0IGZpbGxpbmcgdGhlIHNjcmVlbiB3aXRoIGp1bmsKc3VwcHJlc3NNZXNzYWdlcyhsaWJyYXJ5KHRpZHl2ZXJzZSkpCmBgYAoKIyMgRXVyb3N0YXRzIGFuZCBvdGhlciBxdWVzdGlvbnMKCmBgYHtyfQpsaWJyYXJ5KGV1cm9zdGF0KQpkYXRhIDwtIGdldF9ldXJvc3RhdChpZD0iY2RoX2VfZGlzcyIsIHRpbWVfZm9ybWF0ID0gIm51bSIpCmRhdGEgPC0gbGFiZWxfZXVyb3N0YXQoZGF0YSkKYGBgCgpJZiBSU3R1ZGlvIHdvbid0IG1ha2UgYSBuaWNlIHNjcmVlbiB2ZXJzaW9uIG9mIHRoZSB0YWJsZSAoaGFwcGVucyBvbiBXaW5kb3dzIGNvbXB1dGVycyksIHVzZSB0aGUgYFZpZXcoKWAgZnVuY3Rpb246CmBgYHtyfQpkYXRhICU+JSBWaWV3KCkKYGBgCgpIZXJlJ3Mgb3VyIHdvcmtlZCBleGFtcGxlIGRlcml2ZWQgZnJvbSBNYXJjJ3MgaG9tZXdvcms6CgpJIHRoaW5rIHRoZSBtYWdpYyBoZXJlIGlzIGBzcHJlYWRgOiB3ZSBuZWVkIHRvIGhhdmUgdGhlIDIwMDYgYW5kIDIwMDkgdmFsdWVzIG9uIHRoZSBzYW1lIGxpbmUgc28gd2UgY2FuIGNhbGN1bGF0ZSB3aGV0aGVyIHRoZSB0cmVuZCBpcyBpbmNyZWFzaW5nIG9yIGRlY3JlYXNpbmcuIFRoZSBgZ2VvbV9jdXJ2ZSgpYCBmdW5jdGlvbiBkcmF3cyBhIHNlcmllcyBvZiBsaW5lcyBmcm9tICh4LCB5KSB0byAoeGVuZCwgeWVuZCk7IHRoZSBgY3VydmF0dXJlPTBgIGFyZ3VtZW50IG1ha2VzIHRoZSBjdXJ2ZS4uLiBzdHJhaWdodC4gWW91IGNhbid0IGRvIHRoaXMgd2l0aCBgZ2VvbV9saW5lKClgLCBzaW5jZSBnZW9tX2xpbmUgZHJhd3Mgb25lIGJpZyBsaW5lIGpvaW5pbmcgYSB2ZWN0b3Igb2YgKHgsIHkpIHZhbHVlcyB0b2dldGhlci4KCmBgYHtyfQpkYXRhICU+JQogIGZpbHRlcihzZXggPT0gIlRvdGFsIiwgeV9ncmFkPT0iVG90YWwiKSAlPiUKICBzcHJlYWQodGltZSwgdmFsdWVzKSAlPiUKICBmaWx0ZXIoIWlzLm5hKGAyMDA5YCksICFpcy5uYShgMjAwNmApKSAlPiUKICBtdXRhdGUoc2F0aXNmYWN0aW9uPWlmX2Vsc2UoYDIwMDlgPmAyMDA2YCwgIkluY3JlYXNpbmciLCAiRGVjcmVhc2luZyIpKSAlPiUKICBnZ3Bsb3QoYWVzKHg9cmVhc29uLCB5PWAyMDA2YCwgeGVuZD1yZWFzb24sIHllbmQ9YDIwMDlgLCBjb2xvdXI9c2F0aXNmYWN0aW9uKSkgKwogICAgZ2VvbV9jdXJ2ZShjdXJ2YXR1cmU9MCwgc2l6ZT0yKSArCiAgICBmYWNldF93cmFwKH4gZ2VvKSArCiAgICB0aGVtZShheGlzLnRleHQueD1lbGVtZW50X3RleHQoYW5nbGU9OTAsIGhqdXN0PTEsIHNpemU9NykpICsKICAgIGxhYnMoeT0iRGlzc2F0aXNmYWN0aW9uIGxldmVsIiwgeD0iQ2F1c2Ugb2YgZGlzc2F0aXNmYWN0aW9uIikgKwogICAgZ2d0aXRsZSgiRGlzc2F0aXNmYWN0aW9uIG9mIGVtcGxveWVkIFBoRCBzdHVkZW50cyIsIAogICAgICAgICAgICBzdWJ0aXRsZT0iQ2hhbmdlcyBmcm9tIDIwMDYgdG8gMjAwOSIpCmBgYAoKIyMjIFByb2R1Y2luZyBwdWJsaWNhdGlvbiBxdWFsaXR5IGdyYXBoaWNzLgogClRoZSBgZ2dzYXZlKClgIGZ1bmN0aW9uIG1ha2VzIGEgZ3JhcGhpY3MgZmlsZSBvZiB0aGUgbGFzdCBpbWFnZSBnZW5lcmF0ZWQuIEltcG9ydGFudCBhcmd1bWVudHM6CiAgLSBmaWxlbmFtZSwgcGF0aCB0byBzcGVjaWZ5IHdoZXJlIHRvIHB1dCB0aGUgZmlsZSAodGhlIHR5cGUgb2YgZ3JhcGhpYyBpcyBkZXRlcm1pbmVkIGJ5IHRoZSBmaWxlbmFtZSBzdWZmaXgsIHdoaWNoIHNob3VsZCB1c3VhbGx5IGJlIGAucG5nYCBvciBgLnBkZmApCiAgLSBoZWlnaHQsIHdpZHRoLCB1bml0cyAoImluIiwgIm1tIiwgImNtIikKCmBgYHtyfQpnZ3NhdmUoIk1pY2hhZWxIYXBweVBoRC5wZGYiLCB3aWR0aD0yMCwgaGVpZ2h0PTEwLCB1bml0cz0iY20iKQpgYGAKCiMjIFJlY29kaW5nIGNhdGVnb3J5IGxhYmVscyAoaS5lLiAibGV2ZWxzIiBvZiAiZmFjdG9ycyIpIGFuZCBzZXR0aW5nIHRoZSBvcmRlciBvZiB0aGUgbGV2ZWxzCgotIEFkYXB0ZWQgZnJvbSBodHRwczovL2RlYnJ1aW5lLmdpdGh1Yi5pby9yZWNvZGUuaHRtbAoKUmVjb2RlIHZhbHVlczoKYGBge3J9CmNvdW50cmllcyA8LSBjKCJBVSIsICJBVCIsICJTSyIsICJTSSIpCnJlY29kZShjb3VudHJpZXMsIEFVPSJBdXN0cmFsaWEiLCBBVD0iQXVzdHJpYSIsIFNLPSJTbG92YWtpYSIsIFNJPSJTbG92ZW5pYSIpCmBgYAoKVHVybiBhIHN0cmluZyBpbnRvIG9yZGVyZWQgZmFjdG9yczoKYGBge3J9CmNvdW50cmllcyA8LSBjKCJBdXN0cmFsaWEiLCAiQXVzdHJpYSIsICJTbG92YWtpYSIsICJTbG92ZW5pYSIpCmZhY3Rvcihjb3VudHJpZXMsIGxldmVscz1jKCJTbG92YWtpYSIsICJBdXN0cmlhIiwgIlNsb3ZlbmlhIiwgIkF1c3RyYWxpYSIpKQpgYGAKClRoZSBUaWR5dmVyc2UncyBgZm9yY2F0c2AgcGFja2FnZSBkb2VzIGFsbCB0aGlzIHZlcnkgbmljZWx5IHRvbwoKCiMjIFdvcmtmbG93cwoKCi0gVGVjaG5pcXVlcyBmb3IgY29sbGFib3JhdGlvbi4KCiAgKiBHaXQgaXMgc3RhdGUgb2YgdGhlIGFydCBmb3IgcGxhaW4gdGV4dC1iYXNlZCBjb2xsYWJvcmF0aW9uCiAgKiBTdWl0YWJsZSBmb3Igd29ya2luZyB3aXRoIGNvZGUgYW5kIGRhdGEgKGNmLiBDTERDIG1hdGVyaWFscykKCiAgaHR0cDovL3JvZ2VyZHVkbGVyLmdpdGh1Yi5pby9naXQtZ3VpZGUvCgogICogZ2l0IAogICogZ2l0IHZpYSBnaXRodWIKICAgIC0gRGUgZmFjdG8gbWFzdGVyIGNvcHkKICAgIC0gV2Vic2l0ZSBjYW4gaGF2ZSBhbiAiSXNzdWUgdHJhY2tlciIgYW5kIFdpa2kKICAgIC0gQSBicmlsbGlhbnQgd2F5IHRvIGNvb3JkaW5hdGUgZGF0YSBjb21waWxhdGlvbiBmcm9tIGxvdHMgb2YgcGVvcGxlCiAgKiBnaXRodWIgdmlhIFJTdHVkaW8KICAKICBPdGhlciBhZHZhbnRhZ2VzCiAgCiAgKiBHaXZlcyB5b3UgdmVyc2lvbmVkIGJhY2t1cHMgZm9yIGZyZWUKICAqIEludGVyYWN0cyB3ZWxsIHdpdGggYXJjaGl2aW5nL3B1Ymxpc2hpbmcgdG9vbHMKICAKLSBBcmNoaXZpbmcgYW5kIHB1Ymxpc2hpbmcgcmVzZWFyY2ggYW5hbHlzZXMgb25saW5lICh1c2luZyBlLmcuIEZpZ1NoYXJlKS4gCgogIC0gUmVwbGljYWJpbGl0eQogIC0gUXVhbGl0eSBjb250cm9sCiAgLSBodHRwczovL2ZpZ3NoYXJlLmNvbS9jb2xsZWN0aW9ucy9TdXBwbGVtZW50YXJ5X21hdGVyaWFsX2Zyb21fQV9CYXllc2lhbl9waHlsb2dlbmV0aWNfc3R1ZHlfb2ZfdGhlX0RyYXZpZGlhbl9sYW5ndWFnZV9mYW1pbHlfLzQwMjUwMjAKICAtIGh0dHBzOi8vZmlnc2hhcmUuY29tL2FydGljbGVzL0RhdGl2ZV9TaWNrbmVzc19BX1BoeWxvZ2VuZXRpY19BbmFseXNpc19vZl9Bcmd1bWVudF9TdHJ1Y3R1cmVfRXZvbHV0aW9uX2luX0dlcm1hbmljX3N1cHBsZW1lbnRhcnlfbWF0ZXJpYWxzXy80NjI1OTQxCiAgLSBDb25jZXJucyAocHJpdmFjeSwgZ2V0dGluZyBzY29vcGVkLCBldGMuKQogIAotIEhvdyB0byByZXBvcnQgeW91ciBhbmFseXNpcyBpbiBhIHRoZXNpcyBvciBwYXBlci4gCgogIC0gQmVzdCBwcmFjdGljZSBpbiBtZXRob2RzIGRlc2NyaXB0aW9uCiAgLSBJc3N1ZXMgYXJlIGRpZmZlcmVudCBmb3IgdmlzdWFsaXNhdGlvbgogIAotIE90aGVyIHdvcmtmbG93cyAobGVhdmluZyBSU3R1ZGlvIGZvciB0aGUgdGV4dCBlZGl0b3IgYW5kIGNvbW1hbmQgbGluZSkgCgogIC0gTm90ZWJvb2sgdmVyc3VzIG90aGVyIGtpbmRzIG9mIG1hcmtkb3duIGZpbGVzCiAgLSBSIHNjcmlwdHMKICAgIC0gUnVubmluZyBSIHNjcmlwdHMgdGhyb3VnaCBSU3R1ZGlvCiAgICAtIFdyaXRpbmcgUiBzY3JpcHRzIHdpdGggYSB0ZXh0IGVkaXRvcjsgcnVubmluZyBSIHNjcmlwdHMgZnJvbSB0aGUgY29tbWFuZGxpbmUKCi0gRHluYW1pYyB2aXN1YWxpc2F0aW9uCgogIC0gU2hpbnkgQXBwcyAoaHR0cDovL3NoaW55LnJzdHVkaW8uY29tL2dhbGxlcnkvKQogIC0gQ29sb3VyIFZpc2lvbiBTaW11bGF0b3IKICAKLSBGaW5hbCBtZWV0aW5nCgogIC0gV2hhdCB3b3VsZCB5b3UgbGlrZSB0byBkaXNjdXNzPwogIC0gRmluYWwgYXNzaWdubWVudDogUHJvZHVjZSBhIG5vdmVsIGFuZCBpbnRlcmVzdGluZyB2aXN1YWxpc2F0aW9uIG9mIGRhdGEgZnJvbSB5b3VyIG93biBQaEQgcHJvamVjdC4gSW4gdGhlIGFic2VuY2Ugb2Ygc3VpdGFibGUgZGF0YSB5b3UgY2FuIHVzZSBzb21lIG90aGVyIGRhdGEgcmVsYXRlZCB0byB5b3VyIHByb2plY3QuIFByZXNlbnQgdGhpcyBpbiBhIHNob3J0LCBjb21waWxhYmxlIHJlcG9ydCBpbiBNYXJrZG93biBmb3JtYXQgKGEgTWFya2Rvd24gZG9jdW1lbnQgZGVzaWduZWQgZm9yIHByaW50aW5nLCBub3QgYSBub3RlYm9vaykuICoqRHVlIGRhdGU6KiogVG8gYmUgZGVjaWRlZC4K