Data Visualization with ggplot2:: CHEAT SHEET ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same components: a data set, a coordinate system, and geoms—visual marks that represent data points. The goal of readr is to provide a fast and friendly way to read rectangular data (like csv, tsv, and fwf). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes. Work with strings with stringr:: CHEAT SHEET Detect Matches strdetect(string, pattern) Detect the presence of a pattern match in a string., Inc. CC BY SA RStudio. info@rstudio.com. 844-448-1212. rstudio.com. Learn more at stringr.tidyverse.org. Diagrams from @LVaudor!. stringr 1.2.0.
Getting started
tidyr functions fall into five main categories:
“Pivotting” which converts between long and wide forms. tidyr 1.0.0 introduces
pivot_longer()
andpivot_wider()
, replacing the olderspread()
andgather()
functions. Seevignette('pivot')
for more details.“Rectangling”, which turns deeply nested lists (as from JSON) into tidy tibbles. See
unnest_longer()
,unnest_wider()
,hoist()
, andvignette('rectangle')
for more details.Nesting converts grouped data to a form where each group becomes a single row containing a nested data frame, and unnesting does the opposite. See
nest()
,unnest()
, andvignette('nest')
for more details.Splitting and combining character columns. Use
separate()
andextract()
to pull a single character column into multiple columns; useunite()
to combine multiple columns into a single character column.Make implicit missing values explicit with
complete()
; make explicit missing values implicit withdrop_na()
; replace missing values with next/previous value withfill()
, or a known value withreplace_na()
.
R Data Manipulation Cheat Sheet
Subsetting using the tidyverse
Tidyr Cheat Sheet
You can also subset tibbles
using tidyverse functions from package dplyr
. dplyr
verbs are inspired by SQL vocabulary and designed to be more intuitive.
The first argument of the main dplyr
functions is a tibble
(or data.frame)
Filtering rows with filter()
filter()
allows us to subset observations (rows) based on their values. The first argument is the name of the data frame. The second and subsequent arguments are the expressions that filter the data frame.
dplyr
executes the filtering operation by generating a logical vector and returns a new tibble
of the rows that match the filtering conditions. You can therefore use any logical operators we learnt using [
.
Slicing rows with slice()
Using slice()
is similar to subsetting using element indices in that we provide element indices to select rows.
Selecting columns with select()
select()
allows us to subset columns in tibbles using operations based on the names of the variables.
In dplyr
we use unquoted column names (ie Volume
rather than 'Volume'
).
Behind the scenes, select
matches any variable arguments to column names creating a vector of column indices. This is then used to subset the tibble
. As such we can create ranges of variables using their names and :
There’s also a number of helper functions to make selections easier. For example, we can use one_of()
to provide a character vector of column names to select.