Data science with R and the tidyverse

April 27, 2017

3-4pm

Halligan 102

Host: Remco Chang

Abstract

To do data science you need be to able to solve six main types of problems: 1. __Importing__ your data into your analysis environment of choice.

2. __Tidying__ your data into a consistent form.

3. __Transforming__ it to add new variables or create summaries.

4. __Visualising__ it to help refine your questions and to reveal both the mundane and the surprising.

5. __Modelling__ to scale to larger data volumes, and handle uncertainty in principled way:

6. __Communicating__ your results to others.

In this talk, I'll discuss these challenges in the context of the tidyverse, a set of R packages designed to facilitate interactive data analysis.

Bio: Hadley is Chief Scientist at RStudio and a member of the R Foundation. He builds tools (both computational and cognitive) that make data science easier, faster, and more fun. His work includes packages for data science (the tidyverse: ggplot2, dplyr, tidyr, purrr, readr, ...), and principled software development (roxygen2, testthat, devtools). He is also a writer, educator, and frequent speaker promoting the use of R for data science. Learn more on his website, http://hadley.nz.