Go back to Index

Outline

3.4 Matrices and arrays

3.5 Data Frames

3.6 Lists

3.7 Classes

3.4 Matrices and arrays

Matrices are another type of objects in R. We can think about them as two-dimensional vectors with columns and rows.

To create a matrix, use the function ‘matrix()’

Syntax: matrix(vector, number of rows, number of columns)

matrix(c(10,20,30,40), 2, 2)

# You can also call vectors in the workspace
C <- 3
matrix(C,2,5)

D <- matrix(c(10,20,30,40), nrow=2,ncol=2)
D



# Note that "[ , ]" indicates the position "[row,column]" of an element in a matrix.
# To call an element in a matrix, use the following notation matrix[row,column]
# For example:
D
D[2,1]  # Second row, first column
D[,]    # All elements
D[2,]   # Second row, all columns
D[,2]   # All rows, second column

# Let's go back to our first matrix
D <- matrix(c(10,20,30,40), nrow=2,ncol=2)

We can explore some characteristics of the matrix:

nrow(D)    #Number of rows
ncol(D)    #Number of columns
length(D)  #Product of dimensions
dim(D)     # Both, rows and columns

3.5 Data frames

Data frames are used to create spread-sheet data.

In other words, are matrices that store columns with different kind of data.

To create a dataset, use the function data.frame().

The arguments for ‘data.frame()’ are a series of vectors.

You can give variable names to each of these vectors.

Syntax: data.frame(vector1,vector2,vector3,vector4)

data.frame(age = c(20,24,26,23,29), 
           sex = c("Female","Male","Female","Male","Female"), 
           treatment = c(1,0,0,1,1), 
           income = c(1000,1500,2000,2500,3000))

# NOTE: All parenthesis in a function should be balanced, otherwise, R will be expecting more input and won't execute the command.
# TIP: You can take advantage of this to keep your code clear. Use new lines and tabs to make your commands more legible.
# Note the '+' sign in the command window, which indicates that R is expecting more input.
# NOTE: R-Studio users might not be able to run commands from multiple lines.

E <- data.frame(
            age = c(20,24,26,23,29),
            sex = c("Female","Male","Female","Male","Female"),
            treatment = c(1,0,0,1,1),
            income = c(1000,1500,2000,2500,3000)
            )
E


# Similar to matrices, you can select elements in a data frame by using the following notation
# dataframe[row,column]
# For example:
E
E[4 ,4]     # Fourth row, fourth column
E[  , ]     # All elements
E[4 , ]     # Fourth row, all columns
E[ , 4]     # All rows, fourth column (variable income)

# Nevertheless, it is more convenient to use the $ operator when selecting elements in a dataset.
# The '$' operator refers to the parent database a particular variable belongs to.
# Syntax: database$variable
# For example
E
E$age       # Variable "age" in database "E"
E$sex       # Variable "sex" in database "E"
# "Levels" indicates that R is treating a variable as categorical/factor variable.

# Another way to select variables is by typing their names
E[,"age"]           # Column "age" in database "E"
E[,"sex"]           # Column "sex" in database "E"
E[,c("age","sex")]  # Columns "age" and "sex" in database "E"

# Finally, you can choose a particular element of a variable by using '[ ]'
E$age               # Variable "age" in database "E"
E$age[2]            # Second element of variable "age" in database "E"

# Note that the following notations are equivalent
E$age[2]                # Second element of variable "age" in database "E"
E[ , "age"][2]      # Second element of column "age" in database "E"
E[2, 1]                 # Second row and first column in database "E"
E[ , 1][2]          # Second element of first column in database "E"

3.6 Lists

A list is a generalization of a vector. In a list, elements can be of different type.

# Let's bring our objects back

A <- 5
B <- "R workshop"
C <- c(1:10)
D <- matrix(c(10,20,30,40), nrow=2,ncol=2)
E <- data.frame(
            age = c(20,24,26,23,29),
            sex = c("Female","Male","Female","Male","Female"),
            treatment = c(1,0,0,1,1),
            income = c(1000,1500,2000,2500,3000)
            )
ls()

# Lists are commonly used objects in R.
# They are a collection of other objects, broadly defined.
# Here we make a list of all the objects that we have created so far.
# Use the function 'list()' to create lists, it works similarly to the concatenate function 'c()'
# The difference is that 'list()' creates lists and 'c()' creates vectors. 
# The syntax to retrieve elements from them differ.

list(A,B,C,D,E)

# Note that "[[ ]]" will indicate the position of an object in the list.
# Remember: "[ ]" indicates the position of an element in a vector.
#           "( )" are always and *only* used for functions.
#           "{ }" are used to program loops and functions.

global.list <- list(A,B,C,D,E)
global.list

# To call an object in a list, use the following notation list[[position]]
# For example:
global.list[[3]]    # Third object [3] in list "global.list"


# You can also name objects in the list:

names(global.list) <- c("number", "string", "vector", "matrix", "data.frame")

## And then call them with $
global.list$vector

3.7 Object classes

Many R objects have a class attribute, a character vector giving the names of the classes an object belongs to.

To know the type or “class” of an object, you can use the function class() Syntax: class(object)

class(A)
class(B)
class(C)
class(D)
class(E)
class(global.list)

# Note that in your R-script, some classes may have a different color
# Note: This varies according to the appearance settings you choose!
#(Go to to Tools > Global options >Appearance)

# Numbers           : Orange
3
# Strings           : Green
"R workshop"
# Functions       : White
mean(C)
# Object names    : White
C

# You can change the class of an object by using some of the following commands
# This often comes in handy when reading in a dataset from another format, like Excel.
    # as.numeric()  : converts a string variable to numeric.
    # as.character(): converts a numeric variable to string.
    # as.vector()   : converts a numeric or string matrix to a vector.
    # as.matrix()   : converts a numeric or string vector to a matrix.
    # as.factor()   : converts a numeric or string variable to a categorical variable.
    
# Examples:
A
as.character(A)     # Numeric variable expressed as a string.

D
as.vector(D)        # A 2x2 matrix expressed as a column vector of length 4.

C
as.matrix(C)        # A column vector of length 10 expressed as a 10x1 matrix

E$treatment               # This is a numeric variable.
as.factor(E$treatment)    # A numeric variable expressed as a categorical variable.

E$sex                   # This is a categorical variable.
as.numeric(E$sex)       # A categorical variable forced to a numeric variable.

EXERCISE 2 (Advanced Objects)

Answer the following questions:

  1. Take a look to diag() function. Create a 21-by-21 matrix called “m” with the sequence 10 to 0 to 10 (i.e. 10, 10,…,0,1,..,10) in the diagonal.The rest of the elements should be zero.

  2. What is the length of the following list: > list(a =2, list(b=2, g=3, d=4), e=NULL)

Explain your answer

  1. Create the following matrix:

m1 <- matrix(c(1,2,3,4), ncol=2)

Using [] retrieve the element in the first row and first column, then the element in the second row and second column. Finally, retreive the entire second column.

  1. Using the data frame E created above, include a new column: country = c(“US”, “Canada”, “US”, “Mexico”)

Solutions

Go back to top

Go back to Index


Note: This script is based on the R Workshop created by Gustavo Robles, some exercises are based on Cotton, R. (2013), Learning R , O’Reilly