Some on-line resources:
https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
http://r-statistics.co/ggplot2-cheatsheet.html
http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html
Let’s start using the data from the homework. Remember that for this data we have the following variables:
Let’s create a simple scatterplot of the proportion of votes for the democratic candidate in 2000 vs 2004. We obviously expect a positive relationship between these two variables. Let’s take a look.
First we should create a ggplot
object with the data and aesthetics as paramtehers.
## Load libraries
library(foreign)
library(ggplot2)
## Set working directory
setwd("C://Users/Edgar/Dropbox/PhD/4_year/winter/machineLearning/H1/")
## Load data
ca2006 <- read.csv('ca2006.csv')
### Create a ggplot object object
## In this case data corresponds to the dataset, and the asthetics defines x and y coordinates
dem_pres_plot <- ggplot(data=ca2006, aes(x=dem_pres_2000, y=dem_pres_2004))
The data is already mapped but we still need to specify some geometry to display it. The corresponding geometry for a scatterplot is geom_point()
. Other geometries include geom_histogram()
, geom_line()
and so on. You can already see that these are easy to remember,
dem_pres_plot <- dem_pres_plot + geom_point() # geometry for scatterplot
dem_pres_plot
We can also use additional information from the dataset. For example, let’s add dem_inc
to differentiate those districts with democratic incumbents. We can do this with a new aesthetics, in this case color
which differentiates points by color.
dem_pres_plot <- ggplot(data=ca2006, aes(x=dem_pres_2000, y=dem_pres_2004))
dem_pres_plot<- dem_pres_plot + geom_point(aes(color=(dem_inc))) ## Specify the color aes here
dem_pres_plot
Here we get a spectrum of color for values 0 to 1, but in reality we only need two colors since this is a binary variable. We can transform the variable to a factor directly in ggplot without modifying the data.
dem_pres_plot <- ggplot(data=ca2006, aes(x=dem_pres_2000, y=dem_pres_2004))
dem_pres_plot <- dem_pres_plot + geom_point(aes(color=factor(dem_inc))) # Transform the variable to a factor
dem_pres_plot
Let’s add additional layes. In this case a vertical linea at dem_pres_2000=0.5
.
dem_pres_plot <- ggplot(data=ca2006, aes(x=dem_pres_2000, y=dem_pres_2004))
dem_pres_plot <- dem_pres_plot+ geom_point(aes(color=factor(dem_inc) ))
dem_pres_plot <- dem_pres_plot + geom_vline(xintercept=0.5, color="red") # Adding a line with a specific color.
dem_pres_plot
Looking good! But still the name factor(dem_inc)
in the legend looks weird. We can change the labels with labs()
, also notice that we can change the labels for the new factor variable.
dem_pres_plot <- ggplot(data=ca2006, aes(x=dem_pres_2000, y=dem_pres_2004))
dem_pres_plot <- dem_pres_plot + geom_point(aes(color=factor(dem_inc,labels=c("No", "Yes") ))) # Chaning the labels
dem_pres_plot <- dem_pres_plot + geom_vline(xintercept=0.5, color="red")
dem_pres_plot <- dem_pres_plot + labs(color ="Dem. inc." ) # Chaning the legend title
dem_pres_plot
Let’s add axis labels an title with xlab()
, ylab()
and ggtitle()
dem_pres_plot <- dem_pres_plot + xlab("Dem. Vote 2000") + ylab("Dem. Vote 2004") +
ggtitle("Vote for democratic candidate (2000 vs 2004)")
dem_pres_plot
There are a lot of different themes available for ggplot. After our plot is complete we can simply specify the theme.
dem_pres_plot <- dem_pres_plot + theme_bw()
dem_pres_plot
Cool. Finally, we only need to save it. We can save it in “.eps”, “ps”, “tex” (pictex), “pdf”, “jpeg”, “tiff”, “png”, “bmp”, “svg” or “wmf” formats. We can also change the scale, width, height, size and many other parameters.
# Saving the plot as .pdf and width=6. All other parameters are default.
ggsave('dem_pres_plot.pdf', width=6,dem_pres_plot )
## Saving 6 x 5 in image
Think about the implications of this plot. Why there is a gap around the vertical line? Do district vote consistently across time?