Functions are use to do things with data. We can think about them as verbs rather than nouns
First, lets take a look to some buil-in functions
# Let's take our vector C
C <- c(1:10)
# These are some common functions for numeric vectors
mean(C) # mean
sd(C) # standard deviation
var(C) # variance
max(C) # maximum
min(C) # minimum
median(C) # median
sum(C) # sum
prod(C) # product
quantile(C,probs=0.5) # quantiles
length(C) # length of the vector
range(C) # range
# These functions perform element-wise operations
log(C) # logarithm
exp(C) # exponential
sqrt(C) # squared root
Matrix operators
# Let's work with our matrix D
D <- matrix(c(10,20,30,40), nrow=2,ncol=2)
t(D) # Transpose of a matrix t()
# Note the following difference
D
D*D # Element wise multiplication
D^2 # Element wise exponentiation
D%*%D # Dot product/inner product
D%o%D # Outer multiplication
# These are some common functions for matrices
D
rowSums(D) # Row sums
colSums(D) # Column sums
rowMeans(D) # Row means
colMeans(D) # Columns means
diag(D) # Diagonal of a matrix
solve(D) # Inverse of a matrix
cov(D) # Variance covariance matrix
cor(D) # Correlation matrix
solve(D) #Inverse of D
# Let's work with our string variable B
B <- "R workshop"
B
# These are some common functions for strings
paste(B,"2017", sep = ",") # Concatenates two or more string vectors.
# Syntax: paste(string1,string2,separator).
substr(B, 1, 6) # Substrings in a character vector.
# Syntax: substr(string,start,stop).
strsplit(B,"work") # Splits a string according to a substring.
# Syntax: strsplit(string,split).
grep("work", B) # Logical. Finds a pattern or regular expression within a string.
# Syntax: grep(pattern, string).
# is the word "work" in B?
gsub("workshop","awesome workshop",B) # Replaces a substring if it matches a regular expression.
# Syntax: gsub(pattern, replacement, string).
tolower(B) # Converts a string to lowercase.
toupper(B) # Converts a string to upper case.
We will have a full session on datasets. But here are some common functions for data frames.
# Let's call our data frame E
E <- data.frame(
age = c(20,24,26,23,29),
sex = c("Female","Male","Female","Male","Female"),
treatment = c(1,0,0,1,1),
income = c(1000,1500,2000,2500,3000)
)
dim(E) # Dimensions of the data frame. Syntax: dim(x)
head(E,3) # Shows first n rows. Syntax: head(x,n)
tail(E,3) # Shows last n rows. Syntax: head(x,n)
str(E) # Displays the structure of an object. Syntax: str(x)
summary(E) # Displays summary statistics. Syntax: summary(x)
# You can use the following commands to browse and your data
fix(E) # Opens a database for browsing and editing
edit(E) # Opens the database for editing
View(E) #Opens a separate window
TIP: Although it might take more effort and time, we strongly recommend editing your data from the R script. You will be able to keep track of all the steps taken to clean your data and your results will be replicable to others
To create your own function, we just assign them as anyother variable including the elements as paramethers
Syntax: function(element1, element2, …){ statements return(object) }
## Let's write a function to sum the square of each term
hypothenuse <- function(x, y){
sqrt(x ^2 + y ^2)
}
## This should work with different paramethers
hypothenuse(2,3)
hypothenuse(5,5)
### As with built-in functions we can pass vetors:
normalize <- function(z, m=mean(z), s =sd(z)){
(z-m) /s
}
normalize(c(1,3,6,10,15))
## We can keep track of each step with print
normalize <- function(z, m=mean(z), s =sd(z)){
print(m) ; print(s)
(z-m) /s
}
normalize(c(1,3,6,10,15))
### Note that if one element is missing the result will be missing too:
normalize(c(1,3,6,10,NA))
### We can tell R to remove missing variables before calculations:
normalize <- function(z, m=mean(z, na.rm=T), s =sd(z, na.rm=T)){
(z-m) /s
}
## The cases above only perform one statement. But for cases with many statements is convenient to use 'return'
bad.function <- function(x, y) {
z1 <- 2*x + y
z2 <- x + 2*y
z3 <- 2*x + 2*y
z4 <- x/y
}
bad.function(1, 2) # Only returns the last operation
good.function <- function(x, y) {
z1 <- 2*x + y
z2 <- x + 2*y
z3 <- 2*x + 2*y
z4 <- x/y
return(c(z1, z2, z3, z4))
}
good.function(1,2) # returns all
NOTE: Unlike other languagues R cannot use functions before actually run them (unless you customize your environment). Try to keep all functions at the begining of your code.
Create a function that takes a vector c(1,2,3,4,5) and returns a vector conatining the mean, the sum, the standard deviation and the median
Modify your function to return a list instead of a vector
Create a function that recieves the matrix D defined above. The result should be the sum of the diagonal of the inner product.
Create a function that receives an input of integers and returns a logical vector that is TRUE when the input is even and FALSE when is odd. HINT: Remember that %% is the operator for the remainder of a division
There is a vast online library of functions in R created by other users.
These functions come in “packages”, which are collections of objects including databases, functions, models, and compiled code.
Some packages are already installed in your computer and contain baseline functions and data. Other functions need to be downloaded from the Comprehensive R Archive Network (CRAN). CRAN has close to 6,000 packages available for download.
For example, the package “foreign” includes a function ‘read.dta()’ that allows users to read STATA databases and csv objects.
Since this package has not not been installed in your computer, there is no help file available for the function ‘read.dta’.
TRY:
?read.dta
help(read.dta)
To install a package, use the function ‘install.packages()’ Syntax: ’install.packages(“package.name”)
install.packages(“foreign”)
You will have to select a server or CRAN mirror from which the package will be downloaded to your computer.
Choose the server USA (CA 2) for faster downloads.
You can also use the Package Installer interface located in the drop down menu “Packages & Data”
NOTE: Some packages use functions from other packages. When installing a package make sure to install all dependencies as well by selecting the “Install Dependencies” option.
The directory where packages are stored is called the library.
To get the location of the library in your computer, type ‘.libPaths()’
libPaths()
To see all the packages installed in your computer, call the library by typing ‘library()’
library()
NOTE: A package must be installed in your computer ONCE.
Nevertheless, you have to call or “load” a package into EACH R session you are going to use it.
In other words, you have to select the packages that will be active during your session.
This will avoid confusion on variable, data, and function names.
To load a package installed in your computer, use the function ‘library()’
Syntax: ‘library(“package.name”)’.
Go ahead an type:
library(“foreign”)
Once a package is installed and loaded, their functions, data, and code are available for use. Try:
?read.dta
Right now we don’t have Stata datasets but we will create some in the next session.
You can also use the Package Manager interface located in the drop down menu “Packages & Data”
Simply click on the packages to be loaded into your session.
TIP: We recommend installing and loading packages from your script.
This is especially true when running in BATCH mode, sharing code, and for replication purposes.
To see all the packages loaded into your session, use the function ‘search()’.
search()
There are different types of help files for packages.
library(help=“foreign”)
A help file is structured as follows:
help(package=“foreign”)
The pdf version of the help documentation for package “foreign” is available at:
TIP: We recommend to read the pdf version of help documentation when getting familiar with a package.
“foreign”
“xlsx”
“Zelig”
“dplyr”
Go to the full documentation for dplyr (https://cran.r-project.org/web/packages/dplyr/dplyr.pdf). Read the entry for the plyr function ‘select’. Discuss the logic of this function.
Go to the full documentation of foreign (https://cran.r-project.org/web/packages/foreign/foreign.pdf). Take a look to the different formats you can read with this package.