Agenda:

# Look at this nice way to load libraries without filling the screen with junk
suppressMessages(library(tidyverse))

Eurostats and other questions

library(eurostat)
data <- get_eurostat(id="cdh_e_diss", time_format = "num")
trying URL 'https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?sort=1&file=data%2Fcdh_e_diss.tsv.gz'
Content type 'application/octet-stream;charset=UTF-8' length 6007 bytes
==================================================
downloaded 6007 bytes

Table cdh_e_diss cached at /var/folders/40/g5h19fsj5yl8_tr49ny3jqx0h1wp05/T//RtmpdlsIXZ/eurostat/cdh_e_diss_num_code_TF.rds
data <- label_eurostat(data)

If RStudio won’t make a nice screen version of the table (happens on Windows computers), use the View() function:

data %>% View()

Here’s our worked example derived from Marc’s homework:

I think the magic here is spread: we need to have the 2006 and 2009 values on the same line so we can calculate whether the trend is increasing or decreasing. The geom_curve() function draws a series of lines from (x, y) to (xend, yend); the curvature=0 argument makes the curve… straight. You can’t do this with geom_line(), since geom_line draws one big line joining a vector of (x, y) values together.

data %>%
  filter(sex == "Total", y_grad=="Total") %>%
  spread(time, values) %>%
  filter(!is.na(`2009`), !is.na(`2006`)) %>%
  mutate(satisfaction=if_else(`2009`>`2006`, "Increasing", "Decreasing")) %>%
  ggplot(aes(x=reason, y=`2006`, xend=reason, yend=`2009`, colour=satisfaction)) +
    geom_curve(curvature=0, size=2) +
    facet_wrap(~ geo) +
    theme(axis.text.x=element_text(angle=90, hjust=1, size=7)) +
    labs(y="Dissatisfaction level", x="Cause of dissatisfaction") +
    ggtitle("Dissatisfaction of employed PhD students", 
            subtitle="Changes from 2006 to 2009")

Producing publication quality graphics.

The ggsave() function makes a graphics file of the last image generated. Important arguments: - filename, path to specify where to put the file (the type of graphic is determined by the filename suffix, which should usually be .png or .pdf) - height, width, units (“in”, “mm”, “cm”)

ggsave("MichaelHappyPhD.pdf", width=20, height=10, units="cm")

Recoding category labels (i.e. “levels” of “factors”) and setting the order of the levels

Recode values:

countries <- c("AU", "AT", "SK", "SI")
recode(countries, AU="Australia", AT="Austria", SK="Slovakia", SI="Slovenia")
[1] "Australia" "Austria"   "Slovakia"  "Slovenia" 

Turn a string into ordered factors:

countries <- c("Australia", "Austria", "Slovakia", "Slovenia")
factor(countries, levels=c("Slovakia", "Austria", "Slovenia", "Australia"))
[1] Australia Austria   Slovakia  Slovenia 
Levels: Slovakia Austria Slovenia Australia

The Tidyverse’s forcats package does all this very nicely too

Workflows

LS0tCnRpdGxlOiAiMDkgV29ya2Zsb3dzIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpBZ2VuZGE6CgotIEV1cm9zdGF0cyByZXZpZXcKLSBXb3JrZmxvd3MKLSBEaXNjdXNzaW9uIGZvciBuZXh0IHdlZWsKCmBgYHtyfQojIExvb2sgYXQgdGhpcyBuaWNlIHdheSB0byBsb2FkIGxpYnJhcmllcyB3aXRob3V0IGZpbGxpbmcgdGhlIHNjcmVlbiB3aXRoIGp1bmsKc3VwcHJlc3NNZXNzYWdlcyhsaWJyYXJ5KHRpZHl2ZXJzZSkpCmBgYAoKIyMgRXVyb3N0YXRzIGFuZCBvdGhlciBxdWVzdGlvbnMKCmBgYHtyfQpsaWJyYXJ5KGV1cm9zdGF0KQpkYXRhIDwtIGdldF9ldXJvc3RhdChpZD0iY2RoX2VfZGlzcyIsIHRpbWVfZm9ybWF0ID0gIm51bSIpCmRhdGEgPC0gbGFiZWxfZXVyb3N0YXQoZGF0YSkKYGBgCgpJZiBSU3R1ZGlvIHdvbid0IG1ha2UgYSBuaWNlIHNjcmVlbiB2ZXJzaW9uIG9mIHRoZSB0YWJsZSAoaGFwcGVucyBvbiBXaW5kb3dzIGNvbXB1dGVycyksIHVzZSB0aGUgYFZpZXcoKWAgZnVuY3Rpb246CmBgYHtyfQpkYXRhICU+JSBWaWV3KCkKYGBgCgpIZXJlJ3Mgb3VyIHdvcmtlZCBleGFtcGxlIGRlcml2ZWQgZnJvbSBNYXJjJ3MgaG9tZXdvcms6CgpJIHRoaW5rIHRoZSBtYWdpYyBoZXJlIGlzIGBzcHJlYWRgOiB3ZSBuZWVkIHRvIGhhdmUgdGhlIDIwMDYgYW5kIDIwMDkgdmFsdWVzIG9uIHRoZSBzYW1lIGxpbmUgc28gd2UgY2FuIGNhbGN1bGF0ZSB3aGV0aGVyIHRoZSB0cmVuZCBpcyBpbmNyZWFzaW5nIG9yIGRlY3JlYXNpbmcuIFRoZSBgZ2VvbV9jdXJ2ZSgpYCBmdW5jdGlvbiBkcmF3cyBhIHNlcmllcyBvZiBsaW5lcyBmcm9tICh4LCB5KSB0byAoeGVuZCwgeWVuZCk7IHRoZSBgY3VydmF0dXJlPTBgIGFyZ3VtZW50IG1ha2VzIHRoZSBjdXJ2ZS4uLiBzdHJhaWdodC4gWW91IGNhbid0IGRvIHRoaXMgd2l0aCBgZ2VvbV9saW5lKClgLCBzaW5jZSBnZW9tX2xpbmUgZHJhd3Mgb25lIGJpZyBsaW5lIGpvaW5pbmcgYSB2ZWN0b3Igb2YgKHgsIHkpIHZhbHVlcyB0b2dldGhlci4KCmBgYHtyfQpkYXRhICU+JQogIGZpbHRlcihzZXggPT0gIlRvdGFsIiwgeV9ncmFkPT0iVG90YWwiKSAlPiUKICBzcHJlYWQodGltZSwgdmFsdWVzKSAlPiUKICBmaWx0ZXIoIWlzLm5hKGAyMDA5YCksICFpcy5uYShgMjAwNmApKSAlPiUKICBtdXRhdGUoc2F0aXNmYWN0aW9uPWlmX2Vsc2UoYDIwMDlgPmAyMDA2YCwgIkluY3JlYXNpbmciLCAiRGVjcmVhc2luZyIpKSAlPiUKICBnZ3Bsb3QoYWVzKHg9cmVhc29uLCB5PWAyMDA2YCwgeGVuZD1yZWFzb24sIHllbmQ9YDIwMDlgLCBjb2xvdXI9c2F0aXNmYWN0aW9uKSkgKwogICAgZ2VvbV9jdXJ2ZShjdXJ2YXR1cmU9MCwgc2l6ZT0yKSArCiAgICBmYWNldF93cmFwKH4gZ2VvKSArCiAgICB0aGVtZShheGlzLnRleHQueD1lbGVtZW50X3RleHQoYW5nbGU9OTAsIGhqdXN0PTEsIHNpemU9NykpICsKICAgIGxhYnMoeT0iRGlzc2F0aXNmYWN0aW9uIGxldmVsIiwgeD0iQ2F1c2Ugb2YgZGlzc2F0aXNmYWN0aW9uIikgKwogICAgZ2d0aXRsZSgiRGlzc2F0aXNmYWN0aW9uIG9mIGVtcGxveWVkIFBoRCBzdHVkZW50cyIsIAogICAgICAgICAgICBzdWJ0aXRsZT0iQ2hhbmdlcyBmcm9tIDIwMDYgdG8gMjAwOSIpCmBgYAoKIyMjIFByb2R1Y2luZyBwdWJsaWNhdGlvbiBxdWFsaXR5IGdyYXBoaWNzLgogClRoZSBgZ2dzYXZlKClgIGZ1bmN0aW9uIG1ha2VzIGEgZ3JhcGhpY3MgZmlsZSBvZiB0aGUgbGFzdCBpbWFnZSBnZW5lcmF0ZWQuIEltcG9ydGFudCBhcmd1bWVudHM6CiAgLSBmaWxlbmFtZSwgcGF0aCB0byBzcGVjaWZ5IHdoZXJlIHRvIHB1dCB0aGUgZmlsZSAodGhlIHR5cGUgb2YgZ3JhcGhpYyBpcyBkZXRlcm1pbmVkIGJ5IHRoZSBmaWxlbmFtZSBzdWZmaXgsIHdoaWNoIHNob3VsZCB1c3VhbGx5IGJlIGAucG5nYCBvciBgLnBkZmApCiAgLSBoZWlnaHQsIHdpZHRoLCB1bml0cyAoImluIiwgIm1tIiwgImNtIikKCmBgYHtyfQpnZ3NhdmUoIk1pY2hhZWxIYXBweVBoRC5wZGYiLCB3aWR0aD0yMCwgaGVpZ2h0PTEwLCB1bml0cz0iY20iKQpgYGAKCiMjIFJlY29kaW5nIGNhdGVnb3J5IGxhYmVscyAoaS5lLiAibGV2ZWxzIiBvZiAiZmFjdG9ycyIpIGFuZCBzZXR0aW5nIHRoZSBvcmRlciBvZiB0aGUgbGV2ZWxzCgotIEFkYXB0ZWQgZnJvbSBodHRwczovL2RlYnJ1aW5lLmdpdGh1Yi5pby9yZWNvZGUuaHRtbAoKUmVjb2RlIHZhbHVlczoKYGBge3J9CmNvdW50cmllcyA8LSBjKCJBVSIsICJBVCIsICJTSyIsICJTSSIpCnJlY29kZShjb3VudHJpZXMsIEFVPSJBdXN0cmFsaWEiLCBBVD0iQXVzdHJpYSIsIFNLPSJTbG92YWtpYSIsIFNJPSJTbG92ZW5pYSIpCmBgYAoKVHVybiBhIHN0cmluZyBpbnRvIG9yZGVyZWQgZmFjdG9yczoKYGBge3J9CmNvdW50cmllcyA8LSBjKCJBdXN0cmFsaWEiLCAiQXVzdHJpYSIsICJTbG92YWtpYSIsICJTbG92ZW5pYSIpCmZhY3Rvcihjb3VudHJpZXMsIGxldmVscz1jKCJTbG92YWtpYSIsICJBdXN0cmlhIiwgIlNsb3ZlbmlhIiwgIkF1c3RyYWxpYSIpKQpgYGAKClRoZSBUaWR5dmVyc2UncyBgZm9yY2F0c2AgcGFja2FnZSBkb2VzIGFsbCB0aGlzIHZlcnkgbmljZWx5IHRvbwoKCiMjIFdvcmtmbG93cwoKCi0gVGVjaG5pcXVlcyBmb3IgY29sbGFib3JhdGlvbi4KCiAgKiBHaXQgaXMgc3RhdGUgb2YgdGhlIGFydCBmb3IgcGxhaW4gdGV4dC1iYXNlZCBjb2xsYWJvcmF0aW9uCiAgKiBTdWl0YWJsZSBmb3Igd29ya2luZyB3aXRoIGNvZGUgYW5kIGRhdGEgKGNmLiBDTERDIG1hdGVyaWFscykKCiAgaHR0cDovL3JvZ2VyZHVkbGVyLmdpdGh1Yi5pby9naXQtZ3VpZGUvCgogICogZ2l0IAogICogZ2l0IHZpYSBnaXRodWIKICAgIC0gRGUgZmFjdG8gbWFzdGVyIGNvcHkKICAgIC0gV2Vic2l0ZSBjYW4gaGF2ZSBhbiAiSXNzdWUgdHJhY2tlciIgYW5kIFdpa2kKICAgIC0gQSBicmlsbGlhbnQgd2F5IHRvIGNvb3JkaW5hdGUgZGF0YSBjb21waWxhdGlvbiBmcm9tIGxvdHMgb2YgcGVvcGxlCiAgKiBnaXRodWIgdmlhIFJTdHVkaW8KICAKICBPdGhlciBhZHZhbnRhZ2VzCiAgCiAgKiBHaXZlcyB5b3UgdmVyc2lvbmVkIGJhY2t1cHMgZm9yIGZyZWUKICAqIEludGVyYWN0cyB3ZWxsIHdpdGggYXJjaGl2aW5nL3B1Ymxpc2hpbmcgdG9vbHMKICAKLSBBcmNoaXZpbmcgYW5kIHB1Ymxpc2hpbmcgcmVzZWFyY2ggYW5hbHlzZXMgb25saW5lICh1c2luZyBlLmcuIEZpZ1NoYXJlKS4gCgogIC0gUmVwbGljYWJpbGl0eQogIC0gUXVhbGl0eSBjb250cm9sCiAgLSBodHRwczovL2ZpZ3NoYXJlLmNvbS9jb2xsZWN0aW9ucy9TdXBwbGVtZW50YXJ5X21hdGVyaWFsX2Zyb21fQV9CYXllc2lhbl9waHlsb2dlbmV0aWNfc3R1ZHlfb2ZfdGhlX0RyYXZpZGlhbl9sYW5ndWFnZV9mYW1pbHlfLzQwMjUwMjAKICAtIGh0dHBzOi8vZmlnc2hhcmUuY29tL2FydGljbGVzL0RhdGl2ZV9TaWNrbmVzc19BX1BoeWxvZ2VuZXRpY19BbmFseXNpc19vZl9Bcmd1bWVudF9TdHJ1Y3R1cmVfRXZvbHV0aW9uX2luX0dlcm1hbmljX3N1cHBsZW1lbnRhcnlfbWF0ZXJpYWxzXy80NjI1OTQxCiAgLSBDb25jZXJucyAocHJpdmFjeSwgZ2V0dGluZyBzY29vcGVkLCBldGMuKQogIAotIEhvdyB0byByZXBvcnQgeW91ciBhbmFseXNpcyBpbiBhIHRoZXNpcyBvciBwYXBlci4gCgogIC0gQmVzdCBwcmFjdGljZSBpbiBtZXRob2RzIGRlc2NyaXB0aW9uCiAgLSBJc3N1ZXMgYXJlIGRpZmZlcmVudCBmb3IgdmlzdWFsaXNhdGlvbgogIAotIE90aGVyIHdvcmtmbG93cyAobGVhdmluZyBSU3R1ZGlvIGZvciB0aGUgdGV4dCBlZGl0b3IgYW5kIGNvbW1hbmQgbGluZSkgCgogIC0gTm90ZWJvb2sgdmVyc3VzIG90aGVyIGtpbmRzIG9mIG1hcmtkb3duIGZpbGVzCiAgLSBSIHNjcmlwdHMKICAgIC0gUnVubmluZyBSIHNjcmlwdHMgdGhyb3VnaCBSU3R1ZGlvCiAgICAtIFdyaXRpbmcgUiBzY3JpcHRzIHdpdGggYSB0ZXh0IGVkaXRvcjsgcnVubmluZyBSIHNjcmlwdHMgZnJvbSB0aGUgY29tbWFuZGxpbmUKCi0gRHluYW1pYyB2aXN1YWxpc2F0aW9uCgogIC0gU2hpbnkgQXBwcyAoaHR0cDovL3NoaW55LnJzdHVkaW8uY29tL2dhbGxlcnkvKQogIC0gQ29sb3VyIFZpc2lvbiBTaW11bGF0b3IKICAKLSBGaW5hbCBtZWV0aW5nCgogIC0gV2hhdCB3b3VsZCB5b3UgbGlrZSB0byBkaXNjdXNzPwogIC0gRmluYWwgYXNzaWdubWVudDogUHJvZHVjZSBhIG5vdmVsIGFuZCBpbnRlcmVzdGluZyB2aXN1YWxpc2F0aW9uIG9mIGRhdGEgZnJvbSB5b3VyIG93biBQaEQgcHJvamVjdC4gSW4gdGhlIGFic2VuY2Ugb2Ygc3VpdGFibGUgZGF0YSB5b3UgY2FuIHVzZSBzb21lIG90aGVyIGRhdGEgcmVsYXRlZCB0byB5b3VyIHByb2plY3QuIFByZXNlbnQgdGhpcyBpbiBhIHNob3J0LCBjb21waWxhYmxlIHJlcG9ydCBpbiBNYXJrZG93biBmb3JtYXQgKGEgTWFya2Rvd24gZG9jdW1lbnQgZGVzaWduZWQgZm9yIHByaW50aW5nLCBub3QgYSBub3RlYm9vaykuICoqRHVlIGRhdGU6KiogVG8gYmUgZGVjaWRlZC4K