3.5 Data Frames
3.6 Lists
3.7 Classes
Matrices are another type of objects in R. We can think about them as two-dimensional vectors with columns and rows.
To create a matrix, use the function ‘matrix()’
Syntax: matrix(vector, number of rows, number of columns)
matrix(c(10,20,30,40), 2, 2)
# You can also call vectors in the workspace
C <- 3
matrix(C,2,5)
D <- matrix(c(10,20,30,40), nrow=2,ncol=2)
D
# Note that "[ , ]" indicates the position "[row,column]" of an element in a matrix.
# To call an element in a matrix, use the following notation matrix[row,column]
# For example:
D
D[2,1] # Second row, first column
D[,] # All elements
D[2,] # Second row, all columns
D[,2] # All rows, second column
# Let's go back to our first matrix
D <- matrix(c(10,20,30,40), nrow=2,ncol=2)
We can explore some characteristics of the matrix:
nrow(D) #Number of rows
ncol(D) #Number of columns
length(D) #Product of dimensions
dim(D) # Both, rows and columns
Data frames are used to create spread-sheet data.
In other words, are matrices that store columns with different kind of data.
To create a dataset, use the function data.frame().
The arguments for ‘data.frame()’ are a series of vectors.
You can give variable names to each of these vectors.
Syntax: data.frame(vector1,vector2,vector3,vector4)
data.frame(age = c(20,24,26,23,29),
sex = c("Female","Male","Female","Male","Female"),
treatment = c(1,0,0,1,1),
income = c(1000,1500,2000,2500,3000))
# NOTE: All parenthesis in a function should be balanced, otherwise, R will be expecting more input and won't execute the command.
# TIP: You can take advantage of this to keep your code clear. Use new lines and tabs to make your commands more legible.
# Note the '+' sign in the command window, which indicates that R is expecting more input.
# NOTE: R-Studio users might not be able to run commands from multiple lines.
E <- data.frame(
age = c(20,24,26,23,29),
sex = c("Female","Male","Female","Male","Female"),
treatment = c(1,0,0,1,1),
income = c(1000,1500,2000,2500,3000)
)
E
# Similar to matrices, you can select elements in a data frame by using the following notation
# dataframe[row,column]
# For example:
E
E[4 ,4] # Fourth row, fourth column
E[ , ] # All elements
E[4 , ] # Fourth row, all columns
E[ , 4] # All rows, fourth column (variable income)
# Nevertheless, it is more convenient to use the $ operator when selecting elements in a dataset.
# The '$' operator refers to the parent database a particular variable belongs to.
# Syntax: database$variable
# For example
E
E$age # Variable "age" in database "E"
E$sex # Variable "sex" in database "E"
# "Levels" indicates that R is treating a variable as categorical/factor variable.
# Another way to select variables is by typing their names
E[,"age"] # Column "age" in database "E"
E[,"sex"] # Column "sex" in database "E"
E[,c("age","sex")] # Columns "age" and "sex" in database "E"
# Finally, you can choose a particular element of a variable by using '[ ]'
E$age # Variable "age" in database "E"
E$age[2] # Second element of variable "age" in database "E"
# Note that the following notations are equivalent
E$age[2] # Second element of variable "age" in database "E"
E[ , "age"][2] # Second element of column "age" in database "E"
E[2, 1] # Second row and first column in database "E"
E[ , 1][2] # Second element of first column in database "E"
A list is a generalization of a vector. In a list, elements can be of different type.
# Let's bring our objects back
A <- 5
B <- "R workshop"
C <- c(1:10)
D <- matrix(c(10,20,30,40), nrow=2,ncol=2)
E <- data.frame(
age = c(20,24,26,23,29),
sex = c("Female","Male","Female","Male","Female"),
treatment = c(1,0,0,1,1),
income = c(1000,1500,2000,2500,3000)
)
ls()
# Lists are commonly used objects in R.
# They are a collection of other objects, broadly defined.
# Here we make a list of all the objects that we have created so far.
# Use the function 'list()' to create lists, it works similarly to the concatenate function 'c()'
# The difference is that 'list()' creates lists and 'c()' creates vectors.
# The syntax to retrieve elements from them differ.
list(A,B,C,D,E)
# Note that "[[ ]]" will indicate the position of an object in the list.
# Remember: "[ ]" indicates the position of an element in a vector.
# "( )" are always and *only* used for functions.
# "{ }" are used to program loops and functions.
global.list <- list(A,B,C,D,E)
global.list
# To call an object in a list, use the following notation list[[position]]
# For example:
global.list[[3]] # Third object [3] in list "global.list"
# You can also name objects in the list:
names(global.list) <- c("number", "string", "vector", "matrix", "data.frame")
## And then call them with $
global.list$vector
Many R objects have a class attribute, a character vector giving the names of the classes an object belongs to.
To know the type or “class” of an object, you can use the function class() Syntax: class(object)
class(A)
class(B)
class(C)
class(D)
class(E)
class(global.list)
# Note that in your R-script, some classes may have a different color
# Note: This varies according to the appearance settings you choose!
#(Go to to Tools > Global options >Appearance)
# Numbers : Orange
3
# Strings : Green
"R workshop"
# Functions : White
mean(C)
# Object names : White
C
# You can change the class of an object by using some of the following commands
# This often comes in handy when reading in a dataset from another format, like Excel.
# as.numeric() : converts a string variable to numeric.
# as.character(): converts a numeric variable to string.
# as.vector() : converts a numeric or string matrix to a vector.
# as.matrix() : converts a numeric or string vector to a matrix.
# as.factor() : converts a numeric or string variable to a categorical variable.
# Examples:
A
as.character(A) # Numeric variable expressed as a string.
D
as.vector(D) # A 2x2 matrix expressed as a column vector of length 4.
C
as.matrix(C) # A column vector of length 10 expressed as a 10x1 matrix
E$treatment # This is a numeric variable.
as.factor(E$treatment) # A numeric variable expressed as a categorical variable.
E$sex # This is a categorical variable.
as.numeric(E$sex) # A categorical variable forced to a numeric variable.
Answer the following questions:
Take a look to diag() function. Create a 21-by-21 matrix called “m” with the sequence 10 to 0 to 10 (i.e. 10, 10,…,0,1,..,10) in the diagonal.The rest of the elements should be zero.
What is the length of the following list: > list(a =2, list(b=2, g=3, d=4), e=NULL)
Explain your answer
m1 <- matrix(c(1,2,3,4), ncol=2)
Using [] retrieve the element in the first row and first column, then the element in the second row and second column. Finally, retreive the entire second column.