Chapter 2 Getting started

Welcome to the world of EDAV! As you have already known, we will mainly use R through out the course. In an effort to get everyone on the same page, here is a checklist of essentials so you can get up and running. The best resources are scattered in different places online, so bear with links to various sites depending on the topic.

2.1 Top 10 essentials checklist

(r4ds = R for Data Science by Garrett Grolemund and Hadley Wickham, free online)

  1. Install R (r4ds) – You need to have this installed but you won’t open the application since you’ll be working in RStudio. If you already installed R, make sure you’re current! The latest version of R (as of 2024-04-20) is R 4.3.3 “Angel Food Cake” released on 2024/02/29. (Use > R.version to check what you have.)

  2. Install RStudio (r4ds) – Download the free, Desktop version for your OS. Working in this IDE will make working in R much more enjoyable. As with R, stay current. RStudio is constantly adding new features. The latest version (as of 2023-09-17) is 2023.06.2+561. (Click the RStudio menu, then “About RStudio” to see what version you have.)

  3. Get comfortable with RStudio – In this chapter of Bruno Rodriguez’s Modern R with the Tidyverse, you’ll learn about panes, options, getting help, keyboard shortcuts, projects, add-ins, and packages. Be sure to try out:

    • Do some math in the console
    • Create an Quarto file (.qmd) and render it to .html
    • Install some packages like tidyverse or MASS

    Another great option for learning the IDE: Watch Writing Code in RStudio (RStudio webinar)

  4. Learn “R Nuts and Bolts” – Roger Peng’s chapter in R Programming will give you a solid foundation in the basic building blocks of R. It’s worth making the investing in understanding how R objects work now so they don’t cause you problems later. Focus on vectors and especially data frames; matrices and lists don’t come up often in data visualization. Get familiar with R classes: integer, numeric, character, and logical. Understand how factors work; they are very important for graphing.

  5. Tidy up (r4ds) – Install the tidyverse, and get familiar with what it is. We will discuss differences between base R and the tidyverse in class.

  6. Learn ggplot2 basics (r4ds) – In class we will study the grammar of graphics on which ggplot2 is based, but it will help to familiarize yourself with the syntax in advance. Avail yourself of the “Data Visualization with ggplot2” cheatsheet by clicking “Help” “Cheatsheets…” within RStudio.

  7. Learn some RMarkdown – For this class you will write assignments in R Markdown (stored as .Rmd files) and then render them into pdfs for submission. You can jump right in and open a new R Markdown file (File > New File > R Markdown…), and leave the Default Output Format as HTML. You will get a R Markdown template you can tinker with. Click the “knit” button and see what happens. For more detail, watch the RStudio webinar Getting Started with R Markdown.

Update: RStudio has introduced Quarto, which they call “a multi-language, next generation version of R Markdown.” It is a separate application which must be installed independently. The syntax of Quarto is very similar to that of RMarkdown. Early adopters are welcome to try it out. Follow the installation instructions and tutorials on the get started page. A “Welcome to Quarto Workshop!” is available on YouTube, but it is long.

  1. Use RStudio projects (r4ds) – If you haven’t already, drink the Kool-Aid. Make each problem set a separate project. You will never have to worry about getwd() or setwd() again because everything will just be in the right places. Or watch the webinar: “Projects in RStudio”. If you run into a situation in which you must change the filepaths used to read files depending on whether you are running the code in the Console or knitting the document, it is likely due to having .Rmd files stored in subfolders of the project. The here package will eliminate the need for you to repeatedly make these changes by creating relative paths from the project root, that just work. This is a small but powerful tool; once you start using it there’s no going back.

  2. Learn the basic dplyr verbs for data manipulation (r4ds) – Concentrate on the main verbs: filter() (rows), select() (columns), mutate(), arrange() (rows), group_by(), and summarize(). Learn the native R pipe |> operator. It is very similar to the magrittr pipe %>% which you can see in action in the post “How dplyr replaced my most common R idioms”, which provides a detailed comparison of base R vs. dplyr data transformation.

  3. Know how to tidy your data – The pivot_longer() function from the tidyr package – successor to gather() – will help you get your data in the right form for plotting. More on this in class. Check out these super cool animations, which follow a data frame as it is transformed by tidyr functions.


2.2 Troubleshooting

2.2.1 Document doesn’t knit

Normally an error message will display in the R Markdown section pointing to some lines with specific reasons. Try Googling as your first option and if not finding a solution, leave a post on ed discussion.

2.2.2 Functions stop working

Strange behavior from functions that previously worked are often caused by function conflicts. This can happen if you have two packages loaded with the same function names. To indicate the proper package, namespace it. Conflicts commonly occur with select and filter and map. If you intend the tidyverse ones use:

dplyr::select, dplyr::filter and purrr::map.

Some other culprits:

dplyr::summarise() and vcdExtra::summarise()

ggmosaic::mosaic() and vcd::mosaic()

leaflet::addLegend() and xts::addLegend()

dplyr::select and MASS::select


2.3 Tips & tricks

2.3.1 knitr

Upon creating a new R markdown file, you should always notice a section like this:

{r setup, include=False}
knitr::opts_chunk$set(echo = TRUE)

The chunk options refer to the first line and you can add some of the following options:

{r setup, include=False, warning=False, message=False, cache=True}
knitr::opts_chunk$set(echo = TRUE)

warning=FALSE - Suppress warnings

message=FALSE – Suppress messages, especially useful when loading packages

cache=TRUE – only changed chunks will be evaluated, be careful though since changes in dependencies will not be detected.

2.3.2 Sizing figures

Always use chunk options to size figures. You can set a default size in the YAML at the beginning of the .Rmd file as so:

output:
  pdf_document:
    fig_height: 3
    fig_width: 5

Another method is to click the gear ⚙️ next to the Knit button, then Output Options…, and finally the Figures tab.

Then as needed override one or more defaults in particular chunks:

{r, fig.width=4, fig.height=2}

Figure related chunk options include fig.width, fig.height, fig.asp, and fig.align; there are many more.

2.3.3 R studio keyborad shortcuts

Insert R chunk - option-command-i (Mac) - ctrl+alt+I (Windows)

```{r}
```

Insert %>% (“the pipe”):

  • shift-command(ctrl)-M Mac/Windows

Comment/Uncomment lines

  • shift-command(ctrl)-C Mac/Windows

Knit Document

  • shift-command(ctrl)-K Mac/Windows

For more shortcuts, refer here

2.3.4 Viewing plots in plot window

Would you like your plots to appear in the plot window instead of below each chunk in the .Rmd file? Click ⚙️ and then Chunk Output in Console.