R Tidyverse Cheat Sheet

R Data Manipulation Cheat Sheet
Tidyr Cheat Sheet

Data Visualization with ggplot2:: CHEAT SHEET ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same components: a data set, a coordinate system, and geoms—visual marks that represent data points. The goal of readr is to provide a fast and friendly way to read rectangular data (like csv, tsv, and fwf). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes. Work with strings with stringr:: CHEAT SHEET Detect Matches strdetect(string, pattern) Detect the presence of a pattern match in a string., Inc. CC BY SA RStudio. info@rstudio.com. 844-448-1212. rstudio.com. Learn more at stringr.tidyverse.org. Diagrams from @LVaudor!. stringr 1.2.0.

Getting started

tidyr functions fall into five main categories:

“Pivotting” which converts between long and wide forms. tidyr 1.0.0 introduces pivot_longer() and pivot_wider(), replacing the older spread() and gather() functions. See vignette('pivot') for more details.
“Rectangling”, which turns deeply nested lists (as from JSON) into tidy tibbles. See unnest_longer(), unnest_wider(), hoist(), and vignette('rectangle') for more details.
Nesting converts grouped data to a form where each group becomes a single row containing a nested data frame, and unnesting does the opposite. See nest(), unnest(), and vignette('nest') for more details.
Splitting and combining character columns. Use separate() and extract() to pull a single character column into multiple columns; use unite() to combine multiple columns into a single character column.
Make implicit missing values explicit with complete(); make explicit missing values implicit with drop_na(); replace missing values with next/previous value with fill(), or a known value with replace_na().

R Data Manipulation Cheat Sheet

Subsetting using the tidyverse

Tidyr Cheat Sheet

You can also subset tibbles using tidyverse functions from package dplyr. dplyr verbs are inspired by SQL vocabulary and designed to be more intuitive.

The first argument of the main dplyr functions is a tibble (or data.frame)

Filtering rows with `filter()`

filter() allows us to subset observations (rows) based on their values. The first argument is the name of the data frame. The second and subsequent arguments are the expressions that filter the data frame.

dplyr executes the filtering operation by generating a logical vector and returns a new tibble of the rows that match the filtering conditions. You can therefore use any logical operators we learnt using [.

Slicing rows with `slice()`

Using slice() is similar to subsetting using element indices in that we provide element indices to select rows.

Selecting columns with `select()`

select() allows us to subset columns in tibbles using operations based on the names of the variables.

In dplyr we use unquoted column names (ie Volume rather than 'Volume').

Behind the scenes, select matches any variable arguments to column names creating a vector of column indices. This is then used to subset the tibble. As such we can create ranges of variables using their names and :

There’s also a number of helper functions to make selections easier. For example, we can use one_of() to provide a character vector of column names to select.