Lecture 1, 2018-09-24
VisStat HT18 #1
RStudio is a supercharged calculator, and also the centre of your quantitative world. This week we will focus on basic skills: Running RStudio, getting help, recording your work with markdown notebooks, writing and running scripts, reading and writing data files, and some exploration of basic data types.
The software carpentry analogy: we are like domestic carpenters, not professional cabinetmakers. We can build a functional birdhouse, or a bookcase, but we do not aspire to make a fancy inlaid dresser. This week we will learn how to develop a reproducible, self-documenting workflow. We will also introduce using git (via RStudio) and GitHub for managing data and analysis.
Basic R programming with data and introduction to the tidyverse, a family of R tools that over the last few years have revolutionised how R is used
A lot of our analytic work is what I call ‘data wrangling’: taking raw data and turning it into something that can be analysed. Often, once the data is in the right form, the analysis itself becomes easy. We will learn how to subset data, and other dataframe manipulations using dplyr.
Introduction to basic plot types, and the joys of faceted data.
At this point in the course we should be comfortable with working with data: reading, writing, transforming, and visualising. Now we will look at some statistical tests for telling whether values and distributions of values are the same or different. After going through some basic tests, we will focus on the process of discovering for yourself what you need to know in more complex cases.
Similarity (the flip-side of difference) is an important concept in humanities computing. We will learn how to produce useful similarity measures, and how to analyse and visualise them using techniques such as Principal Components Analysis and Multidimensional Scaling
Data about humans and human behaviour often has a geographic aspect. We will learn how to plot data on geographic maps and learn some commonly used geostatistical tests.
An introduction to inferring the underlying order in your data through statistical classification
Techniques for collaboration. Producing publication quality graphics. Archiving and publishing research analyses online (using e.g. FigShare). How to report your analysis in a thesis or paper. Other workflows (leaving RStudio for the text editor and command line)
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510
… first accessible skills and perspectives—the “good enough” practices—for scientific computing: a minimum set of tools and techniques that we believe every researcher can and should consider adopting.’
HOMEWORK: read this
↓ This is what I type in the R console
2+2
R> [1] 4
↑ This is what is displayed after I press return.
But I will usually use R notebooks for anything complicated so you can follow my working more conveniently.
See 1.notebook.Rmd
, 1.notebook.html
Course website: http://evoling.net/VisStat-HT18/