2  Getting started

Welcome to the world of EDAV! As you have already known, we will mainly use R through out the course. In an effort to get everyone on the same page, here is a checklist of essentials so you can get up and running. The best resources are scattered in different places online, so bear with links to various sites depending on the topic.

2.1 Top 10 essentials checklist

(r4ds = R for Data Science by Garrett Grolemund and Hadley Wickham, free online)

  1. Install R (r4ds) – You need to have this installed but you won’t open the application since you’ll be working in RStudio. If you installed R once upon a time, make sure you’re current! The latest version of R (as of 2024-10-12) is R 4.4.1 “Race for Your Life” released on 2024/06/14. Use > R.version to check what you have.

  2. Install RStudio (r4ds) – Download the free, Desktop version for your OS. Working in this IDE will make working in R much more enjoyable. As with R, stay current. The latest version (as of 2024-10-12) is RStudio-2024.09.0-375. Click the RStudio menu, then “About RStudio” to see what version you have. (Note: RStudio, the company, is now Posit. RStudio, the product, is still RStudio.)

  3. Get comfortable with RStudio – In this chapter of Bruno Rodriguez’s Modern R with the Tidyverse, you’ll learn about panes, options, getting help, keyboard shortcuts, projects, add-ins, and packages. Try to:

    • Do some math in the console
    • Create an Quarto file (.qmd) and render it to .html
    • Install some packages like tidyverse or MASS

    Another great option for learning the IDE: Watch Writing Code in RStudio (RStudio webinar)

  4. Learn “R Nuts and Bolts” – Roger Peng’s chapter in R Programming will give you a solid foundation in the basic building blocks of R. It’s worth making the investing in understanding how R objects work now so they don’t cause you problems later. Focus on vectors and especially data frames; matrices and lists don’t come up often in data visualization. Get familiar with R classes: integer, numeric, character, and logical. Understand how factors work; they are very important for graphing.

  5. Tidy up (r4ds) – Install the tidyverse, and get familiar with what it is. We will discuss differences between base R and the tidyverse in class.

  6. Learn ggplot2 basics (r4ds) – In class we will study the grammar of graphics on which ggplot2 is based, but it will help to familiarize yourself with the syntax in advance. Avail yourself of the “Data Visualization with ggplot2” cheatsheet by clicking “Help” “Cheatsheets…” within RStudio.

  7. Learn some Quarto, a new tool from Posit (formerly RStudio) that they describe as “a multi-language, next generation version of R Markdown.” The syntax of Quarto is very similar to that of RMarkdown, but is a separate application, not an R package. Current versions of RStudio ship with Quarto, so you don’t need to install it. An in-depth (read: long) “Welcome to Quarto Workshop!” is available on YouTube.

  8. Use RStudio projects (r4ds) – If you haven’t already, drink the Kool-Aid. Make each problem set a separate project. You will never have to worry about getwd() or setwd() again because everything will just be in the right places. Or watch the webinar: “Projects in RStudio”. If you run into a situation in which you must change the filepaths used to read files depending on whether you are running the code in the Console or knitting the document, it is likely due to having .Rmd files stored in subfolders of the project. The here package will eliminate the need for you to repeatedly make these changes by creating relative paths from the project root, that just work. This is a small but powerful tool; once you start using it there’s no going back.

  9. Learn the basic dplyr verbs for data manipulation (r4ds) – Concentrate on the main verbs: filter() (rows), select() (columns), mutate(), arrange() (rows), group_by(), and summarize(). Learn the native R pipe |> operator. It is very similar to the magrittr pipe |> which you can see in action in the post “How dplyr replaced my most common R idioms”, which provides a detailed comparison of base R vs. dplyr data transformation.

  10. Know how to tidy your data – The pivot_longer() function from the tidyr package – successor to gather() – will help you get your data in the right form for plotting. More on this in class. Check out these super cool animations, which follow a data frame as it is transformed by tidyr functions.


2.2 Troubleshooting

2.2.1 Functions stop working

Strange behavior from functions that previously worked are often caused by function conflicts. This can happen if you have two packages loaded with the same function names. To indicate the proper package, namespace it. Conflicts commonly occur with select and filter and map. If you intend the tidyverse ones use:

dplyr::select, dplyr::filter and purrr::map.

Some other culprits:

dplyr::summarise() and vcdExtra::summarise()

ggmosaic::mosaic() and vcd::mosaic()

leaflet::addLegend() and xts::addLegend()

dplyr::select and MASS::select


2.3 Tips & tricks

2.3.1 Sizing figures

Always use chunk options to size figures. You can set a default size in the YAML at the beginning of the .qmd file as so:

---
format:
  html:
    fig-width: 6
    fig-height: 4
    out-width: 60%
    embed-resources: true
---

Then as needed override one or more defaults in particular chunks:

```{r}
#| fig-width: 4
#| fig-height: 2
```

2.3.2 RStudio keyborad shortcuts

Insert R chunk - option-command-i (Mac) - ctrl+alt+I (Windows)

```{r}
```

Insert |> (“the pipe”):

  • shift-command(ctrl)-M Mac/Windows

Comment/Uncomment lines

  • shift-command(ctrl)-C Mac/Windows

For more shortcuts, refer here

2.3.3 Viewing plots in plot window

Would you like your plots to appear in the plot window instead of below each chunk in the .qmd file? Click ⚙️ and then Chunk Output in Console.