`install.packages("tidyverse")`

# 7 Learning ggplot2

## 7.1 **Getting started**

Make sure you have installed the **tidyverse** collection of packages with:

To use **ggplot2** you can either call the library with

`library(tidyverse)`

to load all **tidyverse** packages or:

`library(ggplot2)`

for the **ggplot2** package only.

## 7.2 Grammar of Graphics

Unlike many graphics software packages, **ggplot2** has an underlying grammar which enables you to create graphs by combining different basic components or building blocks. Therefore you are not limited by a list of premade charts but can design your own unique graphics given your data and research goals.

The underlying grammar is called the Grammar of Graphics based on a book by Leland Wilkinson with the same title. (That is what the “gg” in **ggplot2** stands for.)

As implemented in **ggplot2** the five basic components of graphs are 1) layers, 2) scales, 3) coordinate system, 4) faceting system, and 5) theme.

The layers contain the data; everything else in a sense helps us to view and interpret the data. Plots contain one or more layers.

## 7.3 Data layers

`::include_graphics("images/layers.png") knitr`

Data layers are made up of: 1) data, 2) a geom, 3) aesthetic mappings, 4) stat, and 5) position. The first three are required; the second two are optional and will rarely need to be changed from the default settings. So, let’s focus on data, geom and aesthetic mappings. Data refers simply to the data frame you are working with. Note that **ggplot2** requires a `data.frame`

or `tibble`

. You cannot plot with other data structures such as vectors, matrices, or lists.

Geom stands for geometric object, which you can think of as the shape in which the data will appear in your graph. Common geoms are point, bar, boxplot, line, histogram, and density. Each geom has a certain number of required pieces of information. For example, to draw a point, you need two pieces of information, an x and a y. These pieces of information are called aesthetic mappings. Let’s say we want to create a scatterplot. We start by recognizing that our graph will contain points so the geom we need is `geom_point()`

. Next we have to

```
library(ggplot2)
ggplot(data = iris) + # Data part
geom_point(aes(x = Sepal.Length, y = Sepal.Width)) # Mapping part
```

The most important part of all plots is data, which includes the information you want to visualize. Based on that, the next step is to decide its mapping, which determine how the data’s variable are mapped to aesthetic attributes on a graphic. Since data is independent from the other elements, you can always add several layers of data into the same ggplot while keeping the other components the same.

```
ggplot(data = iris) + # Data part
geom_point(aes(Petal.Length, Petal.Width)) + # layer 1 with mapping
geom_point(aes(Sepal.Length, Sepal.Width), color = 'red') # layer 2 with a different mapping
```

## 7.4 **Customized parts**

The following picture shows the order of ggplot functions:

For more function order suggestions and auto-correction when writing your own **ggplot2** functions, please refer to ggformat addin created by Joyce.

### 7.4.1 **Geometric object, statistical transformation and position adjustment**

Geometric object, statistical transformation and position adjustment are components that can be customized in each layer.

Geometric objects, called `geoms`

, control graphical elements representing the data–think shapes. Different types of plot have different aesthetics features. For example, a point `geom`

has position, color, shape, and size aesthetics. You should first decide which kind of plot better explains the data before choosing `geoms`

and use `help`

function to check what aesthetics can be modified to achieve your desired effects.

A statistical transformation `stat`

transforms the data. And Position adjustment is applied when you need to adjust the position of elements on the plot for dense data, otherwise data points might obscure one another.

```
ggplot(data = iris) +
geom_histogram(mapping = aes(x = Petal.Length, fill = Species),
stat = 'bin',position = 'stack')
```

### 7.4.2 **Scale**

A scale controls how data is mapped to aesthetic attributes, so one scale for one layer.

```
ggplot(data = iris) +
geom_histogram(mapping = aes(x = Petal.Length, fill = Species),
stat = 'bin', position = 'stack') +
scale_x_continuous(limits = c(0, 10)) +
scale_y_continuous(limits = c(0, 50))
```

### 7.4.3 **Coordinate system**

A coordinate system `coord`

maps the position of objects onto the plane of the plot, and controls how the axes and grid lines are drawn. One ggplot can only have one `coord``

```
ggplot(data = iris) +
geom_histogram(mapping = aes(x = Petal.Length, fill = Species),
stat = 'bin', position = 'stack') +
coord_polar()
```

### 7.4.4 **Faceting**

Faceting can be used to split the data up into subsets of the entire dataset.

```
ggplot(data = iris) +
geom_histogram(mapping = aes(x = Petal.Length), stat = 'bin') +
facet_wrap(~Species)
```

### 7.4.5 **Labels**

Labels include titles, labels for x,y axis and annotates. Good graphics also need to give clear information by using labels to tell readers’ of the background knowledge of your data.

```
ggplot(data = iris) +
geom_histogram(mapping = aes(x = Petal.Length, fill = Species), stat = 'bin',position = 'stack') +
ggtitle('Stacked petal length of different species') +
xlab('Length of Petal')
```

## 7.5 **Resources for ggplot2**

- For more implementations and examples, one easiest way is referring to the ggplot2 Cheatsheets provided by R. Follow the steps shown below and you can find the cheat-sheets in your RStudio.

The cheat-sheets clearly list the basic components of a ggplot where you can customize your unique plot by choosing different functions.

- If you are seeking for more detailed explanations and examples with real datasets, here are some useful links for you:

## 7.6 Required aesthetic mappings

GEOM | REQUIRED MAPPINGS |
---|---|

geom_abline | NA |

geom_area | x and y. |

geom_bar | x or y |

geom_bin_2d | x and y. |

geom_bin2d | x and y. |

geom_blank | NA |

geom_boxplot | x or y |

geom_col | x and y. |

geom_contour | x, y, and z. |

geom_contour_filled | x, y, and z. |

geom_count | x and y. |

geom_crossbar | x, y, ymin, and ymax or x, y, xmin, and xmax. |

geom_curve | x, y, and xend or x, y, and yend. |

geom_density | x or y |

geom_density_2d | x and y. |

geom_density_2d_filled | x and y. |

geom_density2d | x and y. |

geom_density2d_filled | x and y. |

geom_dotplot | x. |

geom_errorbar | x, ymin, and ymax or y, xmin, and xmax. |

geom_errorbarh | xmin, xmax, and y. |

geom_freqpoly | x or y |

geom_function | NA |

geom_hex | x and y. |

geom_histogram | x or y |

geom_hline | yintercept. |

geom_jitter | x and y. |

geom_label | x, y, and label. |

geom_line | x and y. |

geom_linerange | x, ymin, and ymax or y, xmin, and xmax. |

geom_map | NA |

geom_path | x and y. |

geom_point | x and y. |

geom_pointrange | x, y, ymin, and ymax or x, y, xmin, and xmax. |

geom_polygon | x and y. |

geom_qq | sample. |

geom_qq_line | sample. |

geom_quantile | x and y. |

geom_raster | x and y. |

geom_rect | xmin, xmax, ymin, and ymax. |

geom_ribbon | x, ymin, and ymax or y, xmin, and xmax. |

geom_rug | NA |

geom_segment | x, y, and xend or x, y, and yend. |

geom_sf | geometry. |

geom_sf_label | geometry. |

geom_sf_text | geometry. |

geom_smooth | x and y. |

geom_spoke | x, y, angle, and radius. |

geom_step | x and y. |

geom_text | x, y, and label. |

geom_tile | x and y. |

geom_violin | x and y. |

geom_vline | xintercept. |