Go back to Index

Why ggplot?

Some on-line resources:

https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

http://r-statistics.co/ggplot2-cheatsheet.html

http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html


Example with a scatterplot

Let’s start using the data from the homework. Remember that for this data we have the following variables:

  • district: California Congressional district
  • prop_d: Proportion of votes for the Democratic candidate in 2004 in Congressional
  • dem_pres: Proportion of two-party presidential vote for Democratic candidate in 2004 in Congressional district
  • dem_inc: Proportion of two-party presidential vote for Democratic candidate in 2000 in Congressional district
  • contested: An indicator equal to 1 if the election is contested

Let’s create a simple scatterplot of the proportion of votes for the democratic candidate in 2000 vs 2004. We obviously expect a positive relationship between these two variables. Let’s take a look.

First we should create a ggplot object with the data and aesthetics as paramtehers.

## Load libraries
library(foreign)
library(ggplot2)

## Set working directory
setwd("C://Users/Edgar/Dropbox/PhD/4_year/winter/machineLearning/H1/")

## Load data
ca2006 <- read.csv('ca2006.csv')

### Create a ggplot object object
## In this case data corresponds to the dataset, and the asthetics defines x and y coordinates
dem_pres_plot <- ggplot(data=ca2006, aes(x=dem_pres_2000, y=dem_pres_2004))

The data is already mapped but we still need to specify some geometry to display it. The corresponding geometry for a scatterplot is geom_point(). Other geometries include geom_histogram(), geom_line() and so on. You can already see that these are easy to remember,

dem_pres_plot  <- dem_pres_plot  + geom_point() # geometry for scatterplot
dem_pres_plot 

Adding categories with ‘color’

We can also use additional information from the dataset. For example, let’s add dem_inc to differentiate those districts with democratic incumbents. We can do this with a new aesthetics, in this case color which differentiates points by color.

dem_pres_plot <- ggplot(data=ca2006, aes(x=dem_pres_2000, y=dem_pres_2004))
dem_pres_plot<- dem_pres_plot + geom_point(aes(color=(dem_inc))) ## Specify the color aes here
dem_pres_plot

Here we get a spectrum of color for values 0 to 1, but in reality we only need two colors since this is a binary variable. We can transform the variable to a factor directly in ggplot without modifying the data.

dem_pres_plot <- ggplot(data=ca2006, aes(x=dem_pres_2000, y=dem_pres_2004))
dem_pres_plot <- dem_pres_plot + geom_point(aes(color=factor(dem_inc))) # Transform the variable to a factor
dem_pres_plot

Adding a vertical line

Let’s add additional layes. In this case a vertical linea at dem_pres_2000=0.5.

dem_pres_plot <- ggplot(data=ca2006, aes(x=dem_pres_2000, y=dem_pres_2004))
dem_pres_plot <- dem_pres_plot+ geom_point(aes(color=factor(dem_inc) ))
dem_pres_plot <- dem_pres_plot + geom_vline(xintercept=0.5, color="red") # Adding a line with a specific color.
dem_pres_plot

Changing the legend

Looking good! But still the name factor(dem_inc) in the legend looks weird. We can change the labels with labs(), also notice that we can change the labels for the new factor variable.

dem_pres_plot <- ggplot(data=ca2006, aes(x=dem_pres_2000, y=dem_pres_2004))
dem_pres_plot <- dem_pres_plot + geom_point(aes(color=factor(dem_inc,labels=c("No", "Yes") ))) # Chaning the labels
dem_pres_plot <- dem_pres_plot + geom_vline(xintercept=0.5, color="red")
dem_pres_plot <- dem_pres_plot + labs(color ="Dem. inc." ) # Chaning the legend title
dem_pres_plot

Adding axis labels and title

Let’s add axis labels an title with xlab(), ylab() and ggtitle()

dem_pres_plot <- dem_pres_plot + xlab("Dem. Vote 2000") + ylab("Dem. Vote 2004") +
                                ggtitle("Vote for democratic candidate (2000 vs 2004)")
dem_pres_plot

Modifing the theme

There are a lot of different themes available for ggplot. After our plot is complete we can simply specify the theme.

dem_pres_plot <- dem_pres_plot + theme_bw()
dem_pres_plot

Saving the plot

Cool. Finally, we only need to save it. We can save it in “.eps”, “ps”, “tex” (pictex), “pdf”, “jpeg”, “tiff”, “png”, “bmp”, “svg” or “wmf” formats. We can also change the scale, width, height, size and many other parameters.

# Saving the plot as .pdf and width=6. All other parameters are default. 
ggsave('dem_pres_plot.pdf', width=6,dem_pres_plot )
## Saving 6 x 5 in image

Think about the implications of this plot. Why there is a gap around the vertical line? Do district vote consistently across time?