Go back to Index

Outline

  1. Functions in R

  2. Packages

4. Functions in R

Functions are use to do things with data. We can think about them as verbs rather than nouns

First, lets take a look to some buil-in functions

4.1. Help menu (Once again)

There are many built-in functions in R.

We have already seen some of them: c(), matrix(), data.frame(), ls(), rep(), seq(), dim(), etc.

Nevertheless, it is not possible to memorize the syntax of every function.

Fortunately, there is a help file for each function that contains a description and usage info.

The R help system is available with the installation of the program.

To access the help file of a function, use the operator ‘?’ or the function help()

For example, let’s look at the help menu of the function ‘sd()’, that estimates the standard deviation of a vector.

Go ahead and type:

?sd

help(sd)

A help file is structured as follows:

  • function {package} : Name of the function and package it belongs to.
  • Description : Description of the function and related ones.
  • Usage : Generic syntax.
  • Arguments : Function’s arguments (input).
  • Details/Value : Details on the usage and arguments of the function.
  • Note : Additional notes for the function.
  • References : References to books, articles, and authors.
  • See Also : Related functions.
  • Examples : Examples using the function.

4.2. Common functions for vectors

# Let's take our vector C
C <- c(1:10)
        
# These are some common functions for numeric vectors
mean(C)                       # mean
sd(C)                           # standard deviation
var(C)                      # variance
max(C)                      # maximum
min(C)                      # minimum
median(C)                     # median
sum(C)                      # sum
prod(C)                     # product
quantile(C,probs=0.5)   # quantiles
length(C)                   # length of the vector
range(C)                    # range
        
# These functions perform element-wise operations
log(C)      # logarithm
exp(C)      # exponential
sqrt(C)     # squared root

4.3. Common functions for matrices

Matrix operators

# Let's work with our matrix D
D <- matrix(c(10,20,30,40), nrow=2,ncol=2)
t(D)    # Transpose of a matrix t()
        
# Note the following difference
D
D*D     # Element wise multiplication
D^2     # Element wise exponentiation 
D%*%D   # Dot product/inner product 
D%o%D # Outer multiplication
 
# These are some common functions for matrices
D
        
rowSums(D)  # Row sums
colSums(D)  # Column sums
rowMeans(D) # Row means
colMeans(D) # Columns means
diag(D)     # Diagonal of a matrix
solve(D)    # Inverse of a matrix
cov(D)      # Variance covariance matrix
cor(D)      # Correlation matrix
solve(D) #Inverse of D

4.4. Common functions for string variables

# Let's work with our string variable B
B <- "R workshop"
B
        
# These are some common functions for strings
paste(B,"2017", sep = ",")      # Concatenates two or more string vectors. 
                                                # Syntax: paste(string1,string2,separator).


substr(B, 1, 6)                 # Substrings in a character vector. 
                                      # Syntax: substr(string,start,stop).

strsplit(B,"work")              # Splits a string according to a substring. 
                                       # Syntax: strsplit(string,split).

grep("work", B)                 # Logical. Finds a pattern or regular expression within a string. 
                                      # Syntax: grep(pattern, string).
                        # is the word "work" in B?

gsub("workshop","awesome workshop",B)   # Replaces a substring if it matches a regular expression. 
                                      # Syntax: gsub(pattern, replacement, string).

tolower(B)                      # Converts a string to lowercase.
toupper(B)                      # Converts a string to upper case.

4.5. Common functions for data frames

We will have a full session on datasets. But here are some common functions for data frames.

# Let's call our data frame E
E <- data.frame(
            age = c(20,24,26,23,29),
            sex = c("Female","Male","Female","Male","Female"),
            treatment = c(1,0,0,1,1),
            income = c(1000,1500,2000,2500,3000)
            )

        
dim(E)          # Dimensions of the data frame.             Syntax: dim(x)
head(E,3)       # Shows first n rows.                   Syntax: head(x,n) 
tail(E,3)       # Shows last n rows.                    Syntax: head(x,n)
str(E)          # Displays the structure of an object.  Syntax: str(x)
summary(E)      # Displays summary statistics.          Syntax: summary(x)
        
# You can use the following commands to browse and your data
fix(E)          # Opens a database for browsing and editing
edit(E)         # Opens the database for editing
View(E)     #Opens a separate window

TIP: Although it might take more effort and time, we strongly recommend editing your data from the R script. You will be able to keep track of all the steps taken to clean your data and your results will be replicable to others

4.6 Writing functions

To create your own function, we just assign them as anyother variable including the elements as paramethers

Syntax: function(element1, element2, …){ statements return(object) }

## Let's write a function to sum the square of each term
hypothenuse <- function(x, y){
    sqrt(x ^2 + y ^2)
  
}

## This should work with different paramethers
hypothenuse(2,3)

hypothenuse(5,5)


### As with built-in functions we can pass vetors:


normalize <- function(z, m=mean(z), s =sd(z)){
          (z-m) /s
}


normalize(c(1,3,6,10,15))


## We can keep track of each step with print
normalize <- function(z, m=mean(z), s =sd(z)){
          print(m) ; print(s)
          (z-m) /s
}


normalize(c(1,3,6,10,15))

### Note that if one element is missing the result will be missing too:

normalize(c(1,3,6,10,NA))


### We can tell R to remove missing variables before calculations:

normalize <- function(z, m=mean(z, na.rm=T), s =sd(z, na.rm=T)){
          (z-m) /s
}


## The cases above only perform one statement. But for cases with many statements is convenient to use 'return'

bad.function <- function(x, y) {
 z1 <- 2*x + y
 z2 <- x + 2*y
 z3 <- 2*x + 2*y
 z4 <- x/y
}

bad.function(1, 2)  # Only returns the last operation

good.function <- function(x, y) {
 z1 <- 2*x + y
 z2 <- x + 2*y
 z3 <- 2*x + 2*y
 z4 <- x/y
 return(c(z1, z2, z3, z4))
}

good.function(1,2)  # returns all 

NOTE: Unlike other languagues R cannot use functions before actually run them (unless you customize your environment). Try to keep all functions at the begining of your code.


EXERCISE 3 (Functions)

  1. Create a function that takes a vector c(1,2,3,4,5) and returns a vector conatining the mean, the sum, the standard deviation and the median

  2. Modify your function to return a list instead of a vector

  3. Create a function that recieves the matrix D defined above. The result should be the sum of the diagonal of the inner product.

  4. Create a function that receives an input of integers and returns a logical vector that is TRUE when the input is even and FALSE when is odd. HINT: Remember that %% is the operator for the remainder of a division

Solutions


5. R packages

There is a vast online library of functions in R created by other users.

These functions come in “packages”, which are collections of objects including databases, functions, models, and compiled code.

Some packages are already installed in your computer and contain baseline functions and data. Other functions need to be downloaded from the Comprehensive R Archive Network (CRAN). CRAN has close to 6,000 packages available for download.

For example, the package “foreign” includes a function ‘read.dta()’ that allows users to read STATA databases and csv objects.

Since this package has not not been installed in your computer, there is no help file available for the function ‘read.dta’.

TRY:

?read.dta

help(read.dta)

5.1. Installing a package

To install a package, use the function ‘install.packages()’ Syntax: ’install.packages(“package.name”)

install.packages(“foreign”)

You will have to select a server or CRAN mirror from which the package will be downloaded to your computer.

Choose the server USA (CA 2) for faster downloads.

You can also use the Package Installer interface located in the drop down menu “Packages & Data”

NOTE: Some packages use functions from other packages. When installing a package make sure to install all dependencies as well by selecting the “Install Dependencies” option.

The directory where packages are stored is called the library.

To get the location of the library in your computer, type ‘.libPaths()’

libPaths()

To see all the packages installed in your computer, call the library by typing ‘library()’

library()

5.2. Loading a package

NOTE: A package must be installed in your computer ONCE.

Nevertheless, you have to call or “load” a package into EACH R session you are going to use it.

In other words, you have to select the packages that will be active during your session.

This will avoid confusion on variable, data, and function names.

To load a package installed in your computer, use the function ‘library()’

Syntax: ‘library(“package.name”)’.

Go ahead an type:

library(“foreign”)

Once a package is installed and loaded, their functions, data, and code are available for use. Try:

?read.dta

Right now we don’t have Stata datasets but we will create some in the next session.

You can also use the Package Manager interface located in the drop down menu “Packages & Data”

Simply click on the packages to be loaded into your session.

TIP: We recommend installing and loading packages from your script.

This is especially true when running in BATCH mode, sharing code, and for replication purposes.

To see all the packages loaded into your session, use the function ‘search()’.

search()

5.3. Help files for packages

There are different types of help files for packages.

  1. To see the summary of a package, use ‘library(help=“package.name”)’

library(help=“foreign”)

A help file is structured as follows:

  • Package : Name of the package.
  • Version : Version.
  • Date : Publication date.
  • Title : Brief description of the package (what it does).
  • Depends : Dependencies to other packages.
  • Imports : Packages from which it imports functions and code.
  • Description : Full description of the package.
  • Author : Authors.
  • Index : Index and description of package’s objects.
  1. To see the full documentation for a package, use ‘help(package=“package.name”)’

help(package=“foreign”)

  1. For most packages, there is a pdf version of the full documentation available at:

http://cran.r-project.org/web/packages/

The pdf version of the help documentation for package “foreign” is available at:

http://cran.r-project.org/web/packages/foreign/foreign.pdf

TIP: We recommend to read the pdf version of help documentation when getting familiar with a package.


EXERCISE 4 (Packages)

  1. INSTALL AND LOAD THE FOLLOWING PACKAGES

“foreign”

“xlsx”

“Zelig”

“dplyr”

  1. Go to the full documentation for dplyr (https://cran.r-project.org/web/packages/dplyr/dplyr.pdf). Read the entry for the plyr function ‘select’. Discuss the logic of this function.

  2. Go to the full documentation of foreign (https://cran.r-project.org/web/packages/foreign/foreign.pdf). Take a look to the different formats you can read with this package.

Go back to top

Go back to Index


Note: This script is based on the R Workshop created by Gustavo Robles, some exercises are based on Cotton, R. (2013), Learning R , O’Reilly