Before we Start


  • Navigate round RStudio and create an Rproj file.
  • Use RStudio to write and run R programs.
  • Install packages using the Packages tab or the install.packages() command.

Introduction to R


  • Assign values to object using the assignment operator <-. Remove existing objects using the rm() function.
  • Add comments in R scripts using the # operator.
  • Define and use R functions and arguments.
  • Getting help with the ?, ?? and help() functions.
  • Define the following terms as they relate to R: object, vector, assign, call, function.
  • Create or add new objects to a vector using the c() function. Subset vectors using [].
  • Deal with missing data in vectors using the is.na(), na.omit(), and complete.cases() functions.

Starting with Data


  • Use getwd() and setwd() to navigate between directories.
  • Use read_csv() from tidyverse to read tabular data into R.
  • Data frames are made up of vectors of equal length, with each vector representing each column of the data frame.
  • Summarise the dimension, content and variables in a data frame.
  • Using the square brackets [] and logical operators to subset data frames.

Data cleaning & transformation with dplyr


  • Use the dplyr package to manipulate dataframes.
  • Subset data frames using select() and filter().
  • Rename variables in a data frame using rename().
  • Recode values in a data frame using recode().
  • Use mutate() to create new variables.
  • Sort data using arrange().
  • Use group_by() and summarize() to work with subsets of data.
  • Use pipe (%>%) to combine multiple commands.

Data Visualisation with ggplot2


  • ggplot2 is a flexible and useful tool for creating plots in R.
  • The data set and coordinate system can be defined using the ggplot function.
  • Additional layers, including geoms, are added using the + operator.
  • Boxplots are useful for visualizing the distribution of a continuous variable.
  • Barplot are useful for visualizing categorical data.
  • Faceting allows you to generate multiple plots based on a categorical variable.