Main issues for today
- wide vs long data
- fixing data type: as.numeric, as.character
- readxl package
- rbind
- testthat package
- gather and spread
Some homework revision
library(tidyverse)
[30m── [1mAttaching packages[22m ─────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──[39m
[30m[32m✔[30m [34mggplot2[30m 2.2.1 [32m✔[30m [34mpurrr [30m 0.2.5
[32m✔[30m [34mtibble [30m 1.4.2 [32m✔[30m [34mdplyr [30m 0.7.5
[32m✔[30m [34mtidyr [30m 0.8.1 [32m✔[30m [34mstringr[30m 1.3.1
[32m✔[30m [34mreadr [30m 1.1.1 [32m✔[30m [34mforcats[30m 0.3.0[39m
[30m── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31m✖[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
library(babynames)
Kelly visualisation:
read_tsv("./Swedish-Kelly_M3_CEFR.tsv") %>%
arrange(desc(`Raw freq`)) %>%
filter(!is.na(`Raw freq`)) %>%
filter(WPM != 1000000) %>%
mutate(Rank=1:length(ID)) %>%
ggplot(aes(x=log(Rank), y=log(`Raw freq`))) + geom_line()
Parsed with column specification:
cols(
ID = col_integer(),
`Raw freq` = col_integer(),
WPM = col_double(),
`CEFR levels` = col_character(),
Source = col_character(),
Grammar = col_character(),
`Swedish items for translation` = col_character(),
`Word classes` = col_character(),
Examples = col_character()
)
Another version of the same thing, less “idiomatic” tidyverse (old-fashioned way)
kelly <- read_tsv("Swedish-Kelly_M3_CEFR.tsv")
Parsed with column specification:
cols(
ID = col_integer(),
`Raw freq` = col_integer(),
WPM = col_double(),
`CEFR levels` = col_character(),
Source = col_character(),
Grammar = col_character(),
`Swedish items for translation` = col_character(),
`Word classes` = col_character(),
Examples = col_character()
)
kelly <- filter(kelly, !(is.na(`Raw freq`) | WPM == 1000000))
kelly$rank <- 1:nrow(kelly)
kelly %>% ggplot(aes(rank, `Raw freq`)) + geom_line()
Now plot this again taking the log values of rank and frequency
read_tsv("Swedish-Kelly_M3_CEFR.tsv") %>%
filter(!(is.na(`Raw freq`) | WPM == 1000000)) %>%
mutate(rank=1:nrow(.)) %>%
ggplot(aes(log10(rank), log10(`Raw freq`))) + geom_line()
Parsed with column specification:
cols(
ID = col_integer(),
`Raw freq` = col_integer(),
WPM = col_double(),
`CEFR levels` = col_character(),
Source = col_character(),
Grammar = col_character(),
`Swedish items for translation` = col_character(),
`Word classes` = col_character(),
Examples = col_character()
)
What’s that “blip”? How can we fix it?
# Sort it by Raw freq, descending
kelly %>% arrange(desc(`Raw freq`))
And then redo everything with the fix incorporated
read_tsv("Swedish-Kelly_M3_CEFR.tsv") %>%
arrange(desc(`Raw freq`)) %>%
filter(!(is.na(`Raw freq`) | WPM == 1000000)) %>%
mutate(rank=1:nrow(.)) %>%
ggplot(aes(log10(rank), log10(`Raw freq`))) + geom_line()
Parsed with column specification:
cols(
ID = col_integer(),
`Raw freq` = col_integer(),
WPM = col_double(),
`CEFR levels` = col_character(),
Source = col_character(),
Grammar = col_character(),
`Swedish items for translation` = col_character(),
`Word classes` = col_character(),
Examples = col_character()
)
Wide and long data
Wide data:
Long data:
head(babynames)
- Long data is good for visual analysis: every row is an observation
- Humans generally prefer to read wide data
- There are times when wide data is important
gather
changes wide to long
spread
changes long to wide
Wide format
data <- tibble(row=c("A", "B"), x=1:2, y=3:4, z=5:6)
data
Long format
data %>% gather("column", "value", c("x", "y", "z"))
Loading data directly from excel format
(this is relatively new, I didn’t know about it earlier)
These excel files are from the Swedish Central Statistics Agency, SCB
Look at the spreadsheet and the read_excel documentation - named sheets (we need to select a particular sheet) - blank lines at beginning (skip
them) - column types (can you see what they are?)
library(readxl)
Warning messages:
1: Unknown or uninitialised column: 'F'.
2: Unknown or uninitialised column: 'M'.
girls <- read_excel("be0001namntab11_2017.xlsx", sheet = "Flickor", skip = 4)
boys <- read_excel("be0001namntab12_2017.xlsx", sheet = "Pojkar", skip = 4)
head(girls)
Check the column titles:
names(girls)
[1] "Namn" "1998" "1999" "2000" "2001" "2002" "2003" "2004" "2005" "2006" "2007" "2008" "2009" "2010" "2011" "2012" "2013"
[18] "2014" "2015" "2016" "2017"
Warning messages:
1: Unknown or uninitialised column: 'F'.
2: Unknown or uninitialised column: 'M'.
names(boys)
[1] "Namn" "1998" "1999" "2000" "2001" "2002" "2003" "2004" "2005" "2006" "2007" "2008" "2009" "2010" "2011" "2012" "2013"
[18] "2014" "2015" "2016" "2017"
We want to check that the column names of girls
and boys
are the same. You can just do one of the following:
names(girls) == names(boys) # expect a long vector of TRUEs
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Warning messages:
1: Unknown or uninitialised column: 'F'.
2: Unknown or uninitialised column: 'M'.
You can also do:
all(names(girls) == names(boys)) # returns TRUE if all the values in the vector are TRUE
[1] TRUE
But a more elegant way to do it is to incorporate tests. The testthat
package introduces a bunch of expect_
functions that make your script crash (infomatively!) if the expectation is violated.
library(testthat)
Attaching package: ‘testthat’
The following object is masked from ‘package:dplyr’:
matches
The following object is masked from ‘package:purrr’:
is_null
expect_equal(names(girls), names(boys))
The test does nothing if it passes. You can incorporate tests into your scripts to make sure nothing unexpected is happening after e.g. you update data.
a <- 1:5
b <- 1:4
expect_equal(a, b)
Error: `a` not equal to `b`.
Lengths differ: 5 is not 4
Assuming things work as expected you can add a column to specify male or female name, and then bind your tables together into a single table.
library(tidyverse)
girls <- girls %>% mutate(sex = "F")
boys <- boys %>% mutate(sex = "M")
data <- rbind(girls, boys)
Now take a look:
head(data)
Convert swedish babynames from wide to long
Back to Swedish baby names. In order to work with this we need to convert it from wide to long format: there should be a single year
column with year column headers as variables.
Gathering
This is very important!
- gather(.data, new_column_with_collected_headers, new_column_for_values_of_cells, …all_the_columns_to_gather…)
The following function all the year columns into one column with year (the old column header), and one column with the value of the cell.
We use as.character
because this refers to the column headers, which count as text. If we used numerals what would it mean?
long.data <- gather(data, year, count, as.character(1998:2017))
Warning messages:
1: Unknown or uninitialised column: 'F'.
2: Unknown or uninitialised column: 'M'.
head(long.data)
Note that the year and count columns are shown as (character) rather than (a numeric type). You won’t be able to graph these until you fix them. The count data is character rather than numeric; fix this with mutate
long.data <- gather(data, year, count, as.character(1998:2017)) %>%
mutate(year=as.numeric(year), count=as.numeric(count))
NAs introduced by coercionWarning messages:
1: Unknown or uninitialised column: 'F'.
2: Unknown or uninitialised column: 'M'.
head(long.data)
Now ggplot recognises the numbers as numeric rather than as character strings it can plot them:
long.data %>% filter(Namn=="Michael") %>% ggplot(aes(x=year, y=count)) + geom_line()
long.data %>% filter(Namn=="Linnéa") %>% ggplot(aes(x=year, y=count)) + geom_line()
long.data %>% filter(Namn=="Linnéa" | Namn=="Anna" | Namn=="Robert") %>% ggplot(aes(x=year, y=count, linetype=Namn)) + geom_line()
Warning messages:
1: Unknown or uninitialised column: 'F'.
2: Unknown or uninitialised column: 'M'.
Fixing (“coercing”) character types
Look again at head(data)
. All the numbers have been imported at characters. Can you guess why?
This is a problem:
values <- c("1", "7", "8?", "-", "not applicable")
values
[1] "1" "7" "8?" "-" "not applicable"
# You can't do mathematical operations with the character representations of numbers
# values + 1
Use as.numeric
to coerce the type of an object to numeric. Anything that can’t be coerced turns into NA (not available)
numeric.values <- as.numeric(values)
NAs introduced by coercion
numeric.values # note NA for "not available"; the warning message
[1] 1 7 NA NA NA
numeric.values + 1
[1] 2 8 NA NA NA
You can do this the other way around too, with as.character
(like we did in the gather example above)
There are other as.XXX
functions for every other type of object, but you’re less likely to need these.
summarise() and group_by()
Reduces all the rows to one row
babynames %>% summarise(mean_n=mean(n), median_n=median(n))
group_by
reduces all the rows to a smaller number of rows, according to the group_by
term/s; summarise then works on each group (group_by
doesn’t make much sense without a summarise
or similar after it)
# group_by sex
babynames %>% filter(name=="Michael") %>% group_by(sex) %>% summarise(first_seen=min(year), last_seen=max(year))
our_names = c("Anna", "Bror-Magnus", "Lena", "Linnéa", "Maja", "Marc", "Mervi", "Rima", "Robert", "Rune", "Michael")
Warning messages:
1: Unknown or uninitialised column: 'F'.
2: Unknown or uninitialised column: 'M'.
babynames %>% filter(name %in% our_names) %>% group_by(name) %>% summarise(total=sum(n))
Exercise: what are the mean and median number of male and female names in the data?
You can group by multiple things at once to get every combination
babynames %>%
filter(name %in% c("Michael", "Magnus", "Anna", "Maja")) %>%
group_by(name, sex) %>% # every combination of name and sex
summarise(total=sum(n))
Spread example
Here’s a chance to use spread
. Let’s say we want to look at the ratio of male to female version of each of these names:
babynames %>%
filter(name %in% c("Michael", "Magnus", "Anna", "Maja")) %>%
group_by(name, sex) %>% # every combination of name and sex
summarise(total=sum(n)) %>%
spread(sex, total)
We could convert this in a manliness rating for names:
babynames %>%
filter(name %in% c("Michael", "Magnus", "Anna", "Maja")) %>%
group_by(name, sex) %>% # every combination of name and sex
summarise(total=sum(n)) %>%
spread(sex, total) %>%
mutate(manliness=M/(M+F))
Oops, have to change the NAs to 0, because anything + NA is NA
babynames %>%
filter(name %in% c("Michael", "Magnus", "Anna", "Maja")) %>%
group_by(name, sex) %>% # every combination of name and sex
summarise(count=sum(n)) %>%
spread(sex, count) -> data
# writing an intermediate variable is a clunky way to do it, but I'm not too proud
data$F[is.na(data$F)] <- 0
data$M[is.na(data$M)] <- 0
data %>%
mutate(manliness=M/(M+F)) %>%
arrange(desc(manliness))
an aside on indexing
values <- c(3,6,17, NA, NA, 5)
values
[1] 3 6 17 NA NA 5
is.na(values)
[1] FALSE FALSE FALSE TRUE TRUE FALSE
Indexes in square brackets
values[2]
[1] 6
values[2:4]
[1] 6 17 NA
Assign to indexed vectors
values[1:2] <- -99
values
[1] -99 -99 17 NA NA 5
values[is.na(values)] <- 0
values
[1] -99 -99 17 0 0 5
Other kinds of logic also possible
values[values < 0 ] <- NA
values
[1] NA NA 17 0 0 5
Histogram
babynames %>% filter(year==2000) %>% ggplot(aes(x=log10(n))) + geom_histogram()
stringr
Simple character manipulations, see documentation: https://stringr.tidyverse.org/articles/stringr.html
library(stringr) # you might need to load this separately (or it might be part of tidyverse)
# str_sub(x, start, stop)
our_names = c("Anna", "Bror-Magnus", "Lena", "Linnéa", "Maja", "Marc", "Mervi", "Rima", "Robert", "Rune")
str_sub(our_names, 1, 2)
# positive and negative indices
str_sub(our_names, 1, 1)
str_sub(our_names, -3, -1) %>% str_to_upper()
- We can do lots of interesting things with this by adding the output of this to a new column using
mutate
Exercises:
Try one of: - add a column called final_a
with TRUE or FALSE values for whether the name has a final a - add a column for first_letter
- explore this variable (e.g. interaction with sex, change over time)
Trends in a-final names
babynames %>% mutate(final_a=str_sub(name, start=-1)=="a")
We could e.g. graph this, with sex as a variable:
babynames %>%
mutate(final_a=str_sub(name, start=-1)=="a") %>%
group_by(year, sex, final_a) %>%
summarise(count=n()) %>%
ggplot(aes(x=year, y=count, colour=sex, linetype=final_a)) + geom_line()
But this really only makes sense as proportions:
babynames %>%
mutate(final_a=str_sub(name, start=-1)=="a") %>%
group_by(year, sex, final_a) %>%
summarise(count=n()) %>%
spread(final_a, count) %>%
rename(no_final_a=`FALSE`, final_a=`TRUE`) %>%
mutate(prop=final_a/(final_a+no_final_a)) %>%
ggplot(aes(x=year, y=prop, colour=sex)) + geom_line()
Some ideas for further exercises
- What’s happened to “Leslie”? Plot the change in male and female Leslies over time
- Can you order names by what year they peak?
- Can you work out a way to find more names which have changed their typical gender over time?
- Has the proportion of vowel final names (consider girls and boys separately) changed over time?
- What is the male-female bias in initial letters?
- How many distinct names are used for boys and girls each year? What about a better measure (e.g. number of names/number of individuals)
LS0tCnRpdGxlOiAiVmlzdWFsaXNhdGlvbiBhbmQgc3RhdGlzdGljYWwgYW5hbHlzaXMiCmF1dGhvcjogIk1pY2hhZWwgRHVubiwgRGVwdC4gb2YgTGluZ3Vpc3RpY3MgYW5kIFBoaWxvbG9neSwgVXBwc2FsYSBVbml2ZXJzaXR5IgpkYXRlOiAiU2Vzc2lvbiA0LCAyMDE4LTEwLTE4IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpNYWluIGlzc3VlcyBmb3IgdG9kYXkKCi0gd2lkZSB2cyBsb25nIGRhdGEKLSBmaXhpbmcgZGF0YSB0eXBlOiBhcy5udW1lcmljLCBhcy5jaGFyYWN0ZXIKLSByZWFkeGwgcGFja2FnZQotIHJiaW5kCi0gdGVzdHRoYXQgcGFja2FnZQotIGdhdGhlciBhbmQgc3ByZWFkCgojIFNvbWUgaG9tZXdvcmsgcmV2aXNpb24KCmBgYHtyfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeShiYWJ5bmFtZXMpCmBgYAoKS2VsbHkgdmlzdWFsaXNhdGlvbjoKYGBge3J9CnJlYWRfdHN2KCIuL1N3ZWRpc2gtS2VsbHlfTTNfQ0VGUi50c3YiKSAlPiUKICBhcnJhbmdlKGRlc2MoYFJhdyBmcmVxYCkpICU+JQogIGZpbHRlcighaXMubmEoYFJhdyBmcmVxYCkpICU+JQogIGZpbHRlcihXUE0gIT0gMTAwMDAwMCkgJT4lCiAgbXV0YXRlKFJhbms9MTpsZW5ndGgoSUQpKSAlPiUKICBnZ3Bsb3QoYWVzKHg9bG9nKFJhbmspLCB5PWxvZyhgUmF3IGZyZXFgKSkpICsgZ2VvbV9saW5lKCkKYGBgCgpBbm90aGVyIHZlcnNpb24gb2YgdGhlIHNhbWUgdGhpbmcsIGxlc3MgImlkaW9tYXRpYyIgdGlkeXZlcnNlIChvbGQtZmFzaGlvbmVkIHdheSkKYGBge3J9CmtlbGx5IDwtIHJlYWRfdHN2KCJTd2VkaXNoLUtlbGx5X00zX0NFRlIudHN2IikKa2VsbHkgPC0gZmlsdGVyKGtlbGx5LCAhKGlzLm5hKGBSYXcgZnJlcWApIHwgV1BNID09IDEwMDAwMDApKQprZWxseSRyYW5rIDwtIDE6bnJvdyhrZWxseSkKa2VsbHkgJT4lIGdncGxvdChhZXMocmFuaywgYFJhdyBmcmVxYCkpICsgZ2VvbV9saW5lKCkKYGBgCgpOb3cgcGxvdCB0aGlzIGFnYWluIHRha2luZyB0aGUgbG9nIHZhbHVlcyBvZiByYW5rIGFuZCBmcmVxdWVuY3kKYGBge3J9CnJlYWRfdHN2KCJTd2VkaXNoLUtlbGx5X00zX0NFRlIudHN2IikgJT4lIAogIGZpbHRlcighKGlzLm5hKGBSYXcgZnJlcWApIHwgV1BNID09IDEwMDAwMDApKSAlPiUgCiAgbXV0YXRlKHJhbms9MTpucm93KC4pKSAlPiUgCiAgZ2dwbG90KGFlcyhsb2cxMChyYW5rKSwgbG9nMTAoYFJhdyBmcmVxYCkpKSArIGdlb21fbGluZSgpCmBgYAoKV2hhdCdzIHRoYXQgImJsaXAiPyBIb3cgY2FuIHdlIGZpeCBpdD8KYGBge3J9CiMgU29ydCBpdCBieSBSYXcgZnJlcSwgZGVzY2VuZGluZwprZWxseSAlPiUgYXJyYW5nZShkZXNjKGBSYXcgZnJlcWApKQpgYGAKQW5kIHRoZW4gcmVkbyBldmVyeXRoaW5nIHdpdGggdGhlIGZpeCBpbmNvcnBvcmF0ZWQKCmBgYHtyfQpyZWFkX3RzdigiU3dlZGlzaC1LZWxseV9NM19DRUZSLnRzdiIpICU+JQogIGFycmFuZ2UoZGVzYyhgUmF3IGZyZXFgKSkgJT4lCiAgZmlsdGVyKCEoaXMubmEoYFJhdyBmcmVxYCkgfCBXUE0gPT0gMTAwMDAwMCkpICU+JSAKICBtdXRhdGUocmFuaz0xOm5yb3coLikpICU+JSAKICBnZ3Bsb3QoYWVzKGxvZzEwKHJhbmspLCBsb2cxMChgUmF3IGZyZXFgKSkpICsgZ2VvbV9saW5lKCkKYGBgCgoKIyBXaWRlIGFuZCBsb25nIGRhdGEKCldpZGUgZGF0YToKCiFbXSguL3dpZGUtZGF0YS1leGFtcGxlLnBuZykKCkxvbmcgZGF0YToKCmBgYHtyfQpoZWFkKGJhYnluYW1lcykKYGBgCgotIExvbmcgZGF0YSBpcyBnb29kIGZvciB2aXN1YWwgYW5hbHlzaXM6IGV2ZXJ5IHJvdyBpcyBhbiBvYnNlcnZhdGlvbgotIEh1bWFucyBnZW5lcmFsbHkgcHJlZmVyIHRvICpyZWFkKiB3aWRlIGRhdGEKLSBUaGVyZSBhcmUgdGltZXMgd2hlbiB3aWRlIGRhdGEgaXMgaW1wb3J0YW50Ci0gYGdhdGhlcmAgY2hhbmdlcyB3aWRlIHRvIGxvbmcKLSBgc3ByZWFkYCBjaGFuZ2VzIGxvbmcgdG8gd2lkZQoKIVtXaWRlIHZzLiBsb25nXSguL3dpZGUtbG9uZy5wbmcpCgpXaWRlIGZvcm1hdApgYGB7cn0KZGF0YSA8LSB0aWJibGUocm93PWMoIkEiLCAiQiIpLCB4PTE6MiwgeT0zOjQsIHo9NTo2KQpkYXRhCmBgYAoKTG9uZyBmb3JtYXQKYGBge3J9CmRhdGEgJT4lIGdhdGhlcigiY29sdW1uIiwgInZhbHVlIiwgYygieCIsICJ5IiwgInoiKSkKYGBgCgoKIyBMb2FkaW5nIGRhdGEgZGlyZWN0bHkgZnJvbSBleGNlbCBmb3JtYXQKCih0aGlzIGlzIHJlbGF0aXZlbHkgbmV3LCBJIGRpZG4ndCBrbm93IGFib3V0IGl0IGVhcmxpZXIpCgpUaGVzZSBleGNlbCBmaWxlcyBhcmUgZnJvbSB0aGUgU3dlZGlzaCBbQ2VudHJhbCBTdGF0aXN0aWNzIEFnZW5jeSwgU0NCXShodHRwOi8vd3d3LnNjYi5zZS9oaXR0YS1zdGF0aXN0aWsvc3RhdGlzdGlrLWVmdGVyLWFtbmUvYmVmb2xrbmluZy9hbW5lc292ZXJncmlwYW5kZS1zdGF0aXN0aWsvbmFtbnN0YXRpc3Rpay8pCgpMb29rIGF0IHRoZSBzcHJlYWRzaGVldCBhbmQgdGhlIHJlYWRfZXhjZWwgZG9jdW1lbnRhdGlvbgotIG5hbWVkIHNoZWV0cyAod2UgbmVlZCB0byBzZWxlY3QgYSBwYXJ0aWN1bGFyIHNoZWV0KQotIGJsYW5rIGxpbmVzIGF0IGJlZ2lubmluZyAoYHNraXBgIHRoZW0pCi0gY29sdW1uIHR5cGVzIChjYW4geW91IHNlZSB3aGF0IHRoZXkgYXJlPykKCmBgYHtyfQpsaWJyYXJ5KHJlYWR4bCkKZ2lybHMgPC0gcmVhZF9leGNlbCgiYmUwMDAxbmFtbnRhYjExXzIwMTcueGxzeCIsIHNoZWV0ID0gIkZsaWNrb3IiLCBza2lwID0gNCkKYm95cyA8LSByZWFkX2V4Y2VsKCJiZTAwMDFuYW1udGFiMTJfMjAxNy54bHN4Iiwgc2hlZXQgPSAiUG9qa2FyIiwgc2tpcCA9IDQpCmhlYWQoZ2lybHMpCmBgYAoKQ2hlY2sgdGhlIGNvbHVtbiB0aXRsZXM6CmBgYHtyfQpuYW1lcyhnaXJscykKbmFtZXMoYm95cykKYGBgCgpXZSB3YW50IHRvIGNoZWNrIHRoYXQgdGhlIGNvbHVtbiBuYW1lcyBvZiBgZ2lybHNgIGFuZCBgYm95c2AgYXJlIHRoZSBzYW1lLiBZb3UgY2FuIGp1c3QgZG8gb25lIG9mIHRoZSBmb2xsb3dpbmc6CmBgYHtyfQpuYW1lcyhnaXJscykgPT0gbmFtZXMoYm95cykgIyBleHBlY3QgYSBsb25nIHZlY3RvciBvZiBUUlVFcwpgYGAKWW91IGNhbiBhbHNvIGRvOgpgYGB7cn0KYWxsKG5hbWVzKGdpcmxzKSA9PSBuYW1lcyhib3lzKSkgIyByZXR1cm5zIFRSVUUgaWYgYWxsIHRoZSB2YWx1ZXMgaW4gdGhlIHZlY3RvciBhcmUgVFJVRQpgYGAKQnV0IGEgbW9yZSBlbGVnYW50IHdheSB0byBkbyBpdCBpcyB0byBpbmNvcnBvcmF0ZSAqdGVzdHMqLiBUaGUgYHRlc3R0aGF0YCBwYWNrYWdlIGludHJvZHVjZXMgYSBidW5jaCBvZiBgZXhwZWN0X2AgZnVuY3Rpb25zIHRoYXQgbWFrZSB5b3VyIHNjcmlwdCBjcmFzaCAoaW5mb21hdGl2ZWx5ISkgaWYgdGhlIGV4cGVjdGF0aW9uIGlzIHZpb2xhdGVkLgpgYGB7cn0KbGlicmFyeSh0ZXN0dGhhdCkKZXhwZWN0X2VxdWFsKG5hbWVzKGdpcmxzKSwgbmFtZXMoYm95cykpCmBgYApUaGUgdGVzdCBkb2VzICpub3RoaW5nKiBpZiBpdCBwYXNzZXMuIFlvdSBjYW4gaW5jb3Jwb3JhdGUgdGVzdHMgaW50byB5b3VyIHNjcmlwdHMgdG8gbWFrZSBzdXJlIG5vdGhpbmcgdW5leHBlY3RlZCBpcyBoYXBwZW5pbmcgYWZ0ZXIgZS5nLiB5b3UgdXBkYXRlIGRhdGEuCmBgYHtyfQphIDwtIDE6NQpiIDwtIDE6NApleHBlY3RfZXF1YWwoYSwgYikKYGBgCgpBc3N1bWluZyB0aGluZ3Mgd29yayBhcyBleHBlY3RlZCB5b3UgY2FuIGFkZCBhIGNvbHVtbiB0byBzcGVjaWZ5IG1hbGUgb3IgZmVtYWxlIG5hbWUsIGFuZCB0aGVuIGJpbmQgeW91ciB0YWJsZXMgdG9nZXRoZXIgaW50byBhIHNpbmdsZSB0YWJsZS4KYGBge3J9CmxpYnJhcnkodGlkeXZlcnNlKQpnaXJscyA8LSBnaXJscyAlPiUgbXV0YXRlKHNleCA9ICJGIikKYm95cyA8LSBib3lzICU+JSBtdXRhdGUoc2V4ID0gIk0iKQpkYXRhIDwtIHJiaW5kKGdpcmxzLCBib3lzKQpgYGAKTm93IHRha2UgYSBsb29rOgpgYGB7cn0KaGVhZChkYXRhKQpgYGAKCiMgQ29udmVydCBzd2VkaXNoIGJhYnluYW1lcyBmcm9tIHdpZGUgdG8gbG9uZwoKQmFjayB0byBTd2VkaXNoIGJhYnkgbmFtZXMuIEluIG9yZGVyIHRvIHdvcmsgd2l0aCB0aGlzIHdlIG5lZWQgdG8gY29udmVydCBpdCBmcm9tIHdpZGUgdG8gbG9uZyBmb3JtYXQ6IHRoZXJlIHNob3VsZCBiZSBhIHNpbmdsZSBgeWVhcmAgY29sdW1uIHdpdGggeWVhciBjb2x1bW4gaGVhZGVycyBhcyB2YXJpYWJsZXMuIAoKIyBHYXRoZXJpbmcKCioqVGhpcyBpcyB2ZXJ5IGltcG9ydGFudCEqKgoKKiBnYXRoZXIoLmRhdGEsIG5ld19jb2x1bW5fd2l0aF9jb2xsZWN0ZWRfaGVhZGVycywgbmV3X2NvbHVtbl9mb3JfdmFsdWVzX29mX2NlbGxzLCAuLi5hbGxfdGhlX2NvbHVtbnNfdG9fZ2F0aGVyLi4uKQoKVGhlIGZvbGxvd2luZyBmdW5jdGlvbiBhbGwgdGhlIHllYXIgY29sdW1ucyBpbnRvIG9uZSBjb2x1bW4gd2l0aCB5ZWFyICh0aGUgb2xkIGNvbHVtbiBoZWFkZXIpLCBhbmQgb25lIGNvbHVtbiB3aXRoIHRoZSB2YWx1ZSBvZiB0aGUgY2VsbC4KCldlIHVzZSBgYXMuY2hhcmFjdGVyYCBiZWNhdXNlIHRoaXMgcmVmZXJzIHRvIHRoZSBjb2x1bW4gaGVhZGVycywgd2hpY2ggY291bnQgYXMgdGV4dC4gSWYgd2UgdXNlZCBudW1lcmFscyB3aGF0IHdvdWxkIGl0IG1lYW4/CgpgYGB7cn0KbG9uZy5kYXRhIDwtIGdhdGhlcihkYXRhLCB5ZWFyLCBjb3VudCwgYXMuY2hhcmFjdGVyKDE5OTg6MjAxNykpIApoZWFkKGxvbmcuZGF0YSkKYGBgCgpOb3RlIHRoYXQgdGhlIHllYXIgYW5kIGNvdW50IGNvbHVtbnMgYXJlIHNob3duIGFzIDxjaHI+IChjaGFyYWN0ZXIpIHJhdGhlciB0aGFuIDxkYmw+IChhIG51bWVyaWMgdHlwZSkuIFlvdSB3b24ndCBiZSBhYmxlIHRvIGdyYXBoIHRoZXNlIHVudGlsIHlvdSBmaXggdGhlbS4KVGhlIGNvdW50IGRhdGEgaXMgY2hhcmFjdGVyIHJhdGhlciB0aGFuIG51bWVyaWM7IGZpeCB0aGlzIHdpdGggbXV0YXRlCgpgYGB7cn0KbG9uZy5kYXRhIDwtIGdhdGhlcihkYXRhLCB5ZWFyLCBjb3VudCwgYXMuY2hhcmFjdGVyKDE5OTg6MjAxNykpICU+JSAKICBtdXRhdGUoeWVhcj1hcy5udW1lcmljKHllYXIpLCBjb3VudD1hcy5udW1lcmljKGNvdW50KSkKaGVhZChsb25nLmRhdGEpCmBgYAoKTm93IGdncGxvdCByZWNvZ25pc2VzIHRoZSBudW1iZXJzIGFzIG51bWVyaWMgcmF0aGVyIHRoYW4gYXMgY2hhcmFjdGVyIHN0cmluZ3MgaXQgY2FuIHBsb3QgdGhlbToKCmBgYHtyfQpsb25nLmRhdGEgJT4lIGZpbHRlcihOYW1uPT0iTWljaGFlbCIpICU+JSBnZ3Bsb3QoYWVzKHg9eWVhciwgeT1jb3VudCkpICsgZ2VvbV9saW5lKCkKYGBgCgpgYGB7cn0KbG9uZy5kYXRhICU+JSBmaWx0ZXIoTmFtbj09Ikxpbm7DqWEiKSAlPiUgZ2dwbG90KGFlcyh4PXllYXIsIHk9Y291bnQpKSArIGdlb21fbGluZSgpCmBgYAoKYGBge3J9CmxvbmcuZGF0YSAlPiUgZmlsdGVyKE5hbW49PSJMaW5uw6lhIiB8IE5hbW49PSJBbm5hIiB8IE5hbW49PSJSb2JlcnQiKSAlPiUgZ2dwbG90KGFlcyh4PXllYXIsIHk9Y291bnQsIGxpbmV0eXBlPU5hbW4pKSArIGdlb21fbGluZSgpCmBgYAoKIyBGaXhpbmcgKCJjb2VyY2luZyIpIGNoYXJhY3RlciB0eXBlcwoKTG9vayBhZ2FpbiBhdCBgaGVhZChkYXRhKWAuIEFsbCB0aGUgbnVtYmVycyBoYXZlIGJlZW4gaW1wb3J0ZWQgYXQgY2hhcmFjdGVycy4gQ2FuIHlvdSBndWVzcyB3aHk/CgpUaGlzIGlzIGEgcHJvYmxlbToKYGBge3J9CnZhbHVlcyA8LSBjKCIxIiwgIjciLCAiOD8iLCAiLSIsICJub3QgYXBwbGljYWJsZSIpCnZhbHVlcwojIFlvdSBjYW4ndCBkbyBtYXRoZW1hdGljYWwgb3BlcmF0aW9ucyB3aXRoIHRoZSBjaGFyYWN0ZXIgcmVwcmVzZW50YXRpb25zIG9mIG51bWJlcnMKIyB2YWx1ZXMgKyAxCmBgYApVc2UgYGFzLm51bWVyaWNgIHRvIGNvZXJjZSB0aGUgdHlwZSBvZiBhbiBvYmplY3QgdG8gbnVtZXJpYy4gQW55dGhpbmcgdGhhdCBjYW4ndCBiZSBjb2VyY2VkIHR1cm5zIGludG8gTkEgKG5vdCBhdmFpbGFibGUpCmBgYHtyfQpudW1lcmljLnZhbHVlcyA8LSBhcy5udW1lcmljKHZhbHVlcykKbnVtZXJpYy52YWx1ZXMgIyBub3RlIE5BIGZvciAibm90IGF2YWlsYWJsZSI7IHRoZSB3YXJuaW5nIG1lc3NhZ2UKbnVtZXJpYy52YWx1ZXMgKyAxCmBgYAoKWW91IGNhbiBkbyB0aGlzIHRoZSBvdGhlciB3YXkgYXJvdW5kIHRvbywgd2l0aCBgYXMuY2hhcmFjdGVyYCAobGlrZSB3ZSBkaWQgaW4gdGhlIGdhdGhlciBleGFtcGxlIGFib3ZlKQoKVGhlcmUgYXJlIG90aGVyIGBhcy5YWFhgIGZ1bmN0aW9ucyBmb3IgZXZlcnkgb3RoZXIgdHlwZSBvZiBvYmplY3QsIGJ1dCB5b3UncmUgbGVzcyBsaWtlbHkgdG8gbmVlZCB0aGVzZS4KCiMjIyBzdW1tYXJpc2UoKSBhbmQgZ3JvdXBfYnkoKQoKUmVkdWNlcyBhbGwgdGhlIHJvd3MgdG8gKm9uZSogcm93CmBgYHtyfQpiYWJ5bmFtZXMgJT4lIHN1bW1hcmlzZShtZWFuX249bWVhbihuKSwgbWVkaWFuX249bWVkaWFuKG4pKQpgYGAKCmBncm91cF9ieWAgcmVkdWNlcyBhbGwgdGhlIHJvd3MgdG8gYSBzbWFsbGVyIG51bWJlciBvZiByb3dzLCBhY2NvcmRpbmcgdG8gdGhlIGBncm91cF9ieWAgdGVybS9zOyBzdW1tYXJpc2UgdGhlbiB3b3JrcyBvbiBlYWNoICpncm91cCogKGBncm91cF9ieWAgZG9lc24ndCBtYWtlIG11Y2ggc2Vuc2Ugd2l0aG91dCBhIGBzdW1tYXJpc2VgIG9yIHNpbWlsYXIgYWZ0ZXIgaXQpCgpgYGB7cn0KIyBncm91cF9ieSBzZXgKYmFieW5hbWVzICU+JSBmaWx0ZXIobmFtZT09Ik1pY2hhZWwiKSAlPiUgZ3JvdXBfYnkoc2V4KSAlPiUgc3VtbWFyaXNlKGZpcnN0X3NlZW49bWluKHllYXIpLCBsYXN0X3NlZW49bWF4KHllYXIpKQpgYGAKCgpgYGB7cn0Kb3VyX25hbWVzID0gYygiQW5uYSIsICJCcm9yLU1hZ251cyIsICJMZW5hIiwgIkxpbm7DqWEiLCAiTWFqYSIsICJNYXJjIiwgIk1lcnZpIiwgIlJpbWEiLCAiUm9iZXJ0IiwgIlJ1bmUiLCAiTWljaGFlbCIpCmJhYnluYW1lcyAlPiUgZmlsdGVyKG5hbWUgJWluJSBvdXJfbmFtZXMpICU+JSBncm91cF9ieShuYW1lKSAlPiUgc3VtbWFyaXNlKHRvdGFsPXN1bShuKSkKYGBgCgpFeGVyY2lzZTogd2hhdCBhcmUgdGhlIG1lYW4gYW5kIG1lZGlhbiBudW1iZXIgb2YgbWFsZSBhbmQgZmVtYWxlIG5hbWVzIGluIHRoZSBkYXRhPwoKWW91IGNhbiBncm91cCBieSBtdWx0aXBsZSB0aGluZ3MgYXQgb25jZSB0byBnZXQgZXZlcnkgY29tYmluYXRpb24KYGBge3J9CmJhYnluYW1lcyAlPiUgCiAgZmlsdGVyKG5hbWUgJWluJSBjKCJNaWNoYWVsIiwgIk1hZ251cyIsICJBbm5hIiwgIk1hamEiKSkgJT4lIAogIGdyb3VwX2J5KG5hbWUsIHNleCkgJT4lICMgZXZlcnkgY29tYmluYXRpb24gb2YgbmFtZSBhbmQgc2V4CiAgc3VtbWFyaXNlKHRvdGFsPXN1bShuKSkgCmBgYAoKIyMgU3ByZWFkIGV4YW1wbGUKCkhlcmUncyBhIGNoYW5jZSB0byB1c2UgYHNwcmVhZGAuIExldCdzIHNheSB3ZSB3YW50IHRvIGxvb2sgYXQgdGhlICpyYXRpbyogb2YgbWFsZSB0byBmZW1hbGUgdmVyc2lvbiBvZiBlYWNoIG9mIHRoZXNlIG5hbWVzOgoKYGBge3J9CmJhYnluYW1lcyAlPiUgCiAgZmlsdGVyKG5hbWUgJWluJSBjKCJNaWNoYWVsIiwgIk1hZ251cyIsICJBbm5hIiwgIk1hamEiKSkgJT4lIAogIGdyb3VwX2J5KG5hbWUsIHNleCkgJT4lICMgZXZlcnkgY29tYmluYXRpb24gb2YgbmFtZSBhbmQgc2V4CiAgc3VtbWFyaXNlKHRvdGFsPXN1bShuKSkgJT4lIAogIHNwcmVhZChzZXgsIHRvdGFsKQpgYGAKCldlIGNvdWxkIGNvbnZlcnQgdGhpcyBpbiBhIG1hbmxpbmVzcyByYXRpbmcgZm9yIG5hbWVzOgoKYGBge3J9CmJhYnluYW1lcyAlPiUgCiAgZmlsdGVyKG5hbWUgJWluJSBjKCJNaWNoYWVsIiwgIk1hZ251cyIsICJBbm5hIiwgIk1hamEiKSkgJT4lIAogIGdyb3VwX2J5KG5hbWUsIHNleCkgJT4lICMgZXZlcnkgY29tYmluYXRpb24gb2YgbmFtZSBhbmQgc2V4CiAgc3VtbWFyaXNlKHRvdGFsPXN1bShuKSkgJT4lIAogIHNwcmVhZChzZXgsIHRvdGFsKSAlPiUgCiAgbXV0YXRlKG1hbmxpbmVzcz1NLyhNK0YpKQpgYGAKT29wcywgaGF2ZSB0byBjaGFuZ2UgdGhlIE5BcyB0byAwLCBiZWNhdXNlIGFueXRoaW5nICsgTkEgaXMgTkEKCmBgYHtyfQpiYWJ5bmFtZXMgJT4lIAogIGZpbHRlcihuYW1lICVpbiUgYygiTWljaGFlbCIsICJNYWdudXMiLCAiQW5uYSIsICJNYWphIikpICU+JSAKICBncm91cF9ieShuYW1lLCBzZXgpICU+JSAjIGV2ZXJ5IGNvbWJpbmF0aW9uIG9mIG5hbWUgYW5kIHNleAogIHN1bW1hcmlzZShjb3VudD1zdW0obikpICU+JSAgCiAgc3ByZWFkKHNleCwgY291bnQpIC0+IGRhdGEKIyB3cml0aW5nIGFuIGludGVybWVkaWF0ZSB2YXJpYWJsZSBpcyBhIGNsdW5reSB3YXkgdG8gZG8gaXQsIGJ1dCBJJ20gbm90IHRvbyBwcm91ZApkYXRhJEZbaXMubmEoZGF0YSRGKV0gPC0gMApkYXRhJE1baXMubmEoZGF0YSRNKV0gPC0gMApkYXRhICU+JSAKICBtdXRhdGUobWFubGluZXNzPU0vKE0rRikpICU+JSAKICBhcnJhbmdlKGRlc2MobWFubGluZXNzKSkKYGBgCiMjIGFuIGFzaWRlIG9uIGluZGV4aW5nCgpgYGB7cn0KdmFsdWVzIDwtIGMoMyw2LDE3LCBOQSwgTkEsIDUpCnZhbHVlcwpgYGAKYGBge3J9CmlzLm5hKHZhbHVlcykKYGBgCkluZGV4ZXMgaW4gc3F1YXJlIGJyYWNrZXRzCmBgYHtyfQp2YWx1ZXNbMl0KdmFsdWVzWzI6NF0KYGBgCkFzc2lnbiB0byBpbmRleGVkIHZlY3RvcnMKYGBge3J9CnZhbHVlc1sxOjJdIDwtIC05OQp2YWx1ZXMKYGBgCmBgYHtyfQp2YWx1ZXNbaXMubmEodmFsdWVzKV0gPC0gMAp2YWx1ZXMKYGBgCk90aGVyIGtpbmRzIG9mIGxvZ2ljIGFsc28gcG9zc2libGUKYGBge3J9CnZhbHVlc1t2YWx1ZXMgPCAwXSA8LSBOQQp2YWx1ZXMKYGBgCgojIyMgSGlzdG9ncmFtCgpgYGB7cn0KYmFieW5hbWVzICU+JSBmaWx0ZXIoeWVhcj09MjAwMCkgJT4lIGdncGxvdChhZXMoeD1sb2cxMChuKSkpICsgZ2VvbV9oaXN0b2dyYW0oKQpgYGAKCgoKIyMgc3RyaW5nciAKClNpbXBsZSBjaGFyYWN0ZXIgbWFuaXB1bGF0aW9ucywgc2VlIGRvY3VtZW50YXRpb246IGh0dHBzOi8vc3RyaW5nci50aWR5dmVyc2Uub3JnL2FydGljbGVzL3N0cmluZ3IuaHRtbAoKLSBMb29rIGF0IHRoZSBjaGVhdHNoZWV0LCBmcm9tIGh0dHBzOi8vZ2l0aHViLmNvbS9yc3R1ZGlvL2NoZWF0c2hlZXRzLwotIFRoZSAqKkRhdGEgdHJhbnNmb3JtYXRpb24qKiBhbmQgKipSIE1hcmtkb3duKiogY2hlYXRzaGVldHMgYXJlIGdvb2QgdG9vIAoKYGBge3J9CmxpYnJhcnkoc3RyaW5ncikgIyB5b3UgbWlnaHQgbmVlZCB0byBsb2FkIHRoaXMgc2VwYXJhdGVseSAob3IgaXQgbWlnaHQgYmUgcGFydCBvZiB0aWR5dmVyc2UpCiMgc3RyX3N1Yih4LCBzdGFydCwgc3RvcCkKb3VyX25hbWVzID0gYygiQW5uYSIsICJCcm9yLU1hZ251cyIsICJMZW5hIiwgIkxpbm7DqWEiLCAiTWFqYSIsICJNYXJjIiwgIk1lcnZpIiwgIlJpbWEiLCAiUm9iZXJ0IiwgIlJ1bmUiKQpzdHJfc3ViKG91cl9uYW1lcywgMSwgMikKIyBwb3NpdGl2ZSBhbmQgbmVnYXRpdmUgaW5kaWNlcwpzdHJfc3ViKG91cl9uYW1lcywgMSwgMSkKc3RyX3N1YihvdXJfbmFtZXMsIC0zLCAtMSkgJT4lIHN0cl90b191cHBlcigpCmBgYAoKLSBXZSBjYW4gZG8gbG90cyBvZiBpbnRlcmVzdGluZyB0aGluZ3Mgd2l0aCB0aGlzIGJ5IGFkZGluZyB0aGUgb3V0cHV0IG9mIHRoaXMgdG8gYSBuZXcgY29sdW1uIHVzaW5nIGBtdXRhdGVgCgpFeGVyY2lzZXM6CgpUcnkgb25lIG9mOgogIC0gYWRkIGEgY29sdW1uIGNhbGxlZCBgZmluYWxfYWAgd2l0aCBUUlVFIG9yIEZBTFNFIHZhbHVlcyBmb3Igd2hldGhlciB0aGUgbmFtZSBoYXMgYSBmaW5hbCAqYSoKICAtIGFkZCBhIGNvbHVtbiBmb3IgYGZpcnN0X2xldHRlcmAKLSBleHBsb3JlIHRoaXMgdmFyaWFibGUgKGUuZy4gaW50ZXJhY3Rpb24gd2l0aCBzZXgsIGNoYW5nZSBvdmVyIHRpbWUpCgojIFRyZW5kcyBpbiBhLWZpbmFsIG5hbWVzCgpgYGB7cn0KYmFieW5hbWVzICU+JSBtdXRhdGUoZmluYWxfYT1zdHJfc3ViKG5hbWUsIHN0YXJ0PS0xKT09ImEiKQpgYGAKV2UgY291bGQgZS5nLiBncmFwaCB0aGlzLCB3aXRoIHNleCBhcyBhIHZhcmlhYmxlOgpgYGB7cn0KYmFieW5hbWVzICU+JSAKICBtdXRhdGUoZmluYWxfYT1zdHJfc3ViKG5hbWUsIHN0YXJ0PS0xKT09ImEiKSAlPiUgCiAgZ3JvdXBfYnkoeWVhciwgc2V4LCBmaW5hbF9hKSAlPiUgCiAgc3VtbWFyaXNlKGNvdW50PW4oKSkgJT4lIAogIGdncGxvdChhZXMoeD15ZWFyLCB5PWNvdW50LCBjb2xvdXI9c2V4LCBsaW5ldHlwZT1maW5hbF9hKSkgKyBnZW9tX2xpbmUoKQpgYGAKQnV0IHRoaXMgcmVhbGx5IG9ubHkgbWFrZXMgc2Vuc2UgYXMgcHJvcG9ydGlvbnM6CmBgYHtyfQpiYWJ5bmFtZXMgJT4lIAogIG11dGF0ZShmaW5hbF9hPXN0cl9zdWIobmFtZSwgc3RhcnQ9LTEpPT0iYSIpICU+JSAKICBncm91cF9ieSh5ZWFyLCBzZXgsIGZpbmFsX2EpICU+JSAKICBzdW1tYXJpc2UoY291bnQ9bigpKSAlPiUgCiAgc3ByZWFkKGZpbmFsX2EsIGNvdW50KSAlPiUgCiAgcmVuYW1lKG5vX2ZpbmFsX2E9YEZBTFNFYCwgZmluYWxfYT1gVFJVRWApICU+JSAKICBtdXRhdGUocHJvcD1maW5hbF9hLyhmaW5hbF9hK25vX2ZpbmFsX2EpKSAlPiUgCiAgZ2dwbG90KGFlcyh4PXllYXIsIHk9cHJvcCwgY29sb3VyPXNleCkpICsgZ2VvbV9saW5lKCkKYGBgCgpTb21lIGlkZWFzIGZvciBmdXJ0aGVyIGV4ZXJjaXNlcwoKLSBXaGF0J3MgaGFwcGVuZWQgdG8gIkxlc2xpZSI/IFBsb3QgdGhlIGNoYW5nZSBpbiBtYWxlIGFuZCBmZW1hbGUgTGVzbGllcyBvdmVyIHRpbWUKLSBDYW4geW91IG9yZGVyIG5hbWVzIGJ5IHdoYXQgeWVhciB0aGV5IHBlYWs/Ci0gQ2FuIHlvdSB3b3JrIG91dCBhIHdheSB0byBmaW5kIG1vcmUgbmFtZXMgd2hpY2ggaGF2ZSBjaGFuZ2VkIHRoZWlyIHR5cGljYWwgZ2VuZGVyIG92ZXIgdGltZT8KLSBIYXMgdGhlIHByb3BvcnRpb24gb2Ygdm93ZWwgZmluYWwgbmFtZXMgKGNvbnNpZGVyIGdpcmxzIGFuZCBib3lzIHNlcGFyYXRlbHkpIGNoYW5nZWQgb3ZlciB0aW1lPwotIFdoYXQgaXMgdGhlIG1hbGUtZmVtYWxlIGJpYXMgaW4gaW5pdGlhbCBsZXR0ZXJzPwotIEhvdyBtYW55IGRpc3RpbmN0IG5hbWVzIGFyZSB1c2VkIGZvciBib3lzIGFuZCBnaXJscyBlYWNoIHllYXI/IFdoYXQgYWJvdXQgYSBiZXR0ZXIgbWVhc3VyZSAoZS5nLiBudW1iZXIgb2YgbmFtZXMvbnVtYmVyIG9mIGluZGl2aWR1YWxzKQ==