library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
library(choroplethr)
# data frame must contain "region" and "value" columns
<- state.x77 |> as.data.frame() |>
df_illiteracy rownames_to_column("state") |>
transmute(region = tolower(`state`), value = Illiteracy)
state_choropleth(df_illiteracy,
title = "State Illiteracy Rates, 1977",
legend = "Percent Illiterate")
17 Spatial data
The page is currently being updated, check back later.
17.1 Introduction
In this chapter, we will take a glimpse into spatial analysis using R. There are an overwhelming number of R packages for analyzing and visualizing spatial data. In broad terms, spatial visualizations require a merging of non-spatial and spatial information. For example, if you wish to create a choropleth map of the murder rate by county in New York State, you need county level data on murder rates, and you also need geographic data for drawing the county boundaries, stored in what are called shape files.
A rough divide exists between packages that don’t require you deal with shape files and those that do. The former work by taking care of the geographic data under the hood: you supply the data with a column for the location and the package takes care of figuring out how to draw those locations.
17.2 Packages
17.2.1 choroplethr
Choropleth maps use color to indicate the value of a variable within a defined region, generally political boundaries. choroplethr
is capable of drawing state and county level maps without using shape files.
Consider the following example using state.x77
showing the percentage of illiterate in each states. Note the process of data transformation, you need to exactly have a column named ‘region’ with the state names and a column named ‘value’.
Plotting at county level is also possible. In the following example we show the population of counties in New York state in 2012.
library(choroplethrMaps)
data(county.regions)
data(df_pop_county)
= county.regions |>
ny_county_fips filter(state.name == "new york") |>
select(region)
county_choropleth(df_pop_county,
title = "Population of Counties in New York State in 2012",
legend = "Population",
county_zoom = ny_county_fips$region)
We use county_choropleth
to create a U.S map at county level and zoom into New York state. Note that county_zoom
takes in fips codes of counties. For more reference, visit the R documentation page.
17.2.2 ggmap
ggmap
is a great package to generate real-world maps. Moreover, it is compatible with ggplot2
allowing you to easily add layers on top of the base map. There are two options in generating maps and we demonstrate one using stamenmap. (This now requires an API key… to be updated.)
# eval=FALSE for now...
library(ggmap)
<- get_stadiamap(bbox = c(left = -74.2591, bottom = 40.4774,
ny_map right = -73.7002, top = 40.9162),
zoom = 10, maptype = "stamen_toner_lite")
ggmap(ny_map) +
geom_point(aes(x=-73.9857,y=40.7484),size=0.8,color='blue')
A map of New York City is created, and you will see a blue point representing Empire State Building. You can very simply add points using geom_point
as long as you have the coordinates for the points. However, one clear drawback using stamenmap is that the map resolution is not satisfying and you will eventually find that the smaller zoom value creates blurry maps.
For this reason, we encourage readers to try the option using Google maps. You will need to set up a Google Cloud account (requires credit card). After enabling the Google map APIs, register you API keys with ggmap
using register_google(key = "[your key]")
and you are all set. Now take a look at the an example using Google map API
ggmap(get_googlemap(center = c(lon = -74.006, lat = 40.7128),zoom = 14))
You will see the map is at great resolution regardless of how you zoom in. For demonstration purpose, we provided a image of the map derived by running the code. For more details regarding the package usage, you can refer at ggmap GitHub page.
17.3 Shape file
There are cases such that no existing packages can directly create your desired map. For example, what if we want to create a map of highways in New York State. In such case, we will resort to shape files. In the following example, we use the shape file of police precinct in New York City.
17.3.1 sf
We use sf
library to read in shape files. One of the first thing to remember is that one shape file contains a set of files and all of them required for the shape file to run properly. However, when reading the file, we only need to use the one ending with .shp
. For example
library(sf)
= st_read('nyc_police/nypp.shp',quiet=TRUE) |>
ny_police mutate(Precinct = as.character(Precinct))
head(ny_police)
Simple feature collection with 6 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 971013.5 ymin: 188082.3 xmax: 992119.1 ymax: 217541.3
Projected CRS: NAD83 / New York Long Island (ftUS)
Precinct Shape_Leng Shape_Area geometry
1 1 81117.86 47300568 MULTIPOLYGON (((972081.8 19...
2 5 18807.12 18094527 MULTIPOLYGON (((987399.2 20...
3 6 26413.24 22103327 MULTIPOLYGON (((984337.5 20...
4 7 17288.06 18365996 MULTIPOLYGON (((991608.6 20...
5 9 19773.00 21395839 MULTIPOLYGON (((992119.1 20...
6 10 40281.97 27266581 MULTIPOLYGON (((983866 2172...
You can see that the object looks like a data set, except with a long column called geometry. The geometry column contains spatial information and is the key to map generation.
To simply see the shape, we can use st_geometry()
to extract spatial information and feed into plot()
plot(st_geometry(ny_police))
We get a map of New York City divided by police precinct. Note that if you directly feed in ny_police
into plot()
, you will get mulitiply maps drawn from each column. It might not create you desired maps and when you are dealing with large shape files, it will be extremely slow.
17.3.2 tmap
Only generating the map is not particularly meaningful. We want to combine data with the map to convey some meaningful findings. Since we are using police precinct, it is only natural to use crime data.
library(readxl)
library(tmap)
<- read_xls('data/felony.xls', skip = 1) |>
ny_crime rename(Precinct = PCT, Crime = CRIME) |>
fill(Precinct) |>
filter(Crime == "TOTAL SEVEN MAJOR FELONY OFFENSES")
<- left_join(ny_police, ny_crime) ny_crime
We read in the felony data for nyc in 2021 and joined it with the sf
object. Then, we create map using tmap
package. Notice that in tm_polygons()
, we specify the column to be 2021
.
|>
ny_crime tm_shape() +
tm_polygons("2021", palette = "Blues", title="") +
tm_text("Precinct", size = .65) +
tm_layout("Major Felony Offenses in 2021 by\nNYC Police Precinct",
title.size = .95, frame = FALSE)
For more usage, you can refer at tmap:get started!