10 Chart: QQ-Plot

This chapter originated as a community contribution created by hao871563506

This page is a work in progress. We appreciate any input you may have. If you would like to help improve this page, consider contributing to our repo.

10.1 Introduction

In statistics, a Q-Q (quantile-quantile) plot is a probability plot, which is a graphical method for comparing two probability distributions by plotting their quantiles against each other. A point (x, y) on the plot corresponds to one of the quantiles of the second distribution (y-coordinate) plotted against the same quantile of the first distribution (x-coordinate). Thus the line is a parametric curve with the parameter which is the number of the interval for the quantile.

10.2 Interpreting qqplots

10.3 Normal or not (examples using qqnorm)

10.3.1 Normal qqplot

x <- rnorm(1000, 50, 10)
qqnorm(x)
qqline(x, col = "red")

The points seem to fall along a straight line. Notice the x-axis plots the theoretical quantiles. Those are the quantiles from the standard Normal distribution with mean 0 and standard deviation 1.

10.3.2 Non-normal qqplot

x <- rexp(1000, 5)
qqnorm(x)
qqline(x, col = "red")

Notice the points form a curve instead of a straight line. Normal Q-Q plots that look like this usually mean your sample data are skewed.

10.4 Different kinds of qqplots

The following graph is a conclusion of all the kinds of qqplot: via Stack Exchange

  • Normal qqplot: The normal distribution is symmetric, so it has no skew (the mean is equal to the median).

  • Right skewed qqplot: Right-skew is also known as positive skew.

  • Left skewed qqplot: Left-skew is also known as negative skew.

  • Light tailed qqplot: meaning that compared to the normal distribution there is little more data located at the extremes of the distribution and less data in the center of the distribution.

  • Heavy tailed qqplot: meaning that compared to the normal distribution there is much more data located at the extremes of the distribution and less data in the center of the distribution.

  • Biomodel qqplot: illustrate a bimodal distribution.

10.5 qqplot using ggplot

In order to use ggplot2 to plot a qqplot, we must use a dataframe, so here we convert it to one. We can see that using ggplot to plot a qqplot has a similar outcome as using qqnorm

library(ggplot2)
x <- rnorm(1000, 50, 10)
x <- data.frame(x)
ggplot(x, aes(sample = x)) +
  stat_qq() +
  stat_qq_line()

However, when we need to plot different groups, ggplot will be very helpful with its coloring by factor.

library(ggplot2)
ggplot(mtcars, aes(sample = mpg, colour = factor(cyl))) +
  stat_qq() +
  stat_qq_line()

10.6 References







with