x <- 10
x > 5[1] TRUE
#Bonus
y <- x > 5
print(y)[1] TRUE
x <- 10
x > 5[1] TRUE
#Bonus
y <- x > 5
print(y)[1] TRUE
Make a vector with the numbers 1 through 26. Multiply the vector by 2, and give the resulting vector names A through Z (hint: there is a built in vector called LETTERS).
x <- 1:26
x <- x * 2
names(x) <- LETTERS
print(x) A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52
Make a matrix with the numbers 1:50, with 5 columns and 10 rows. Did the matrix function fill your matrix by column, or by row, as its default behavior? Once you have figured it out, try to change the default. (hint: read the documentation for matrix)
# By default the matrix is filled by columns, we can change this behavior using byrow=TRUE
m <- matrix(1:50, ncol = 5, nrow = 10, byrow = T)
print(m) [,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
[5,] 21 22 23 24 25
[6,] 26 27 28 29 30
[7,] 31 32 33 34 35
[8,] 36 37 38 39 40
[9,] 41 42 43 44 45
[10,] 46 47 48 49 50
Bonus: Which of the following commands was used to generate the matrix below?
| [,1] | [,2] | |
| [1,] | 4 | 1 |
| [2,] | 9 | 5 |
| [3,] | 10 | 7 |
matrix(c(4, 1, 9, 5, 10, 7), nrow = 3)matrix(c(4, 9, 10, 1, 5, 7), ncol = 2, byrow = TRUE)matrix(c(4, 9, 10, 1, 5, 7), nrow = 2)matrix(c(4, 1, 9, 5, 10, 7), ncol = 2, byrow = TRUE)# correct
matrix(c(4, 1, 9, 5, 10, 7), ncol = 2, byrow = TRUE)
## [,1] [,2]
## [1,] 4 1
## [2,] 9 5
## [3,] 10 7
# others
matrix(c(4, 1, 9, 5, 10, 7), nrow = 3)
## [,1] [,2]
## [1,] 4 5
## [2,] 1 10
## [3,] 9 7
matrix(c(4, 9, 10, 1, 5, 7), ncol = 2, byrow = TRUE)
## [,1] [,2]
## [1,] 4 9
## [2,] 10 1
## [3,] 5 7
matrix(c(4, 9, 10, 1, 5, 7), nrow = 2)
## [,1] [,2] [,3]
## [1,] 4 10 5
## [2,] 9 1 7The byrow Argument
The matrix() function works like a worker filling a grid of boxes. The byrow argument tells that worker whether to walk across the rows or down the columns.
byrow = FALSE (Default): The worker fills the first column from top to bottom, then moves to the second column. This is “Column-major order.”
byrow = TRUE: The worker fills the first row from left to right, then moves to the second row. This is “Row-major order.”
Create a list of length two containing a character vector for each of the data sections: (1) Data types and (2) Data structures. Populate each character vector with the names of the data types and data structures, respectively.
dt <- c('double', 'complex', 'integer', 'character', 'logical')
ds <- c('data.frame', 'vector', 'factor', 'list', 'matrix')
data.sections <- list(dt, ds)
print(data.sections)[[1]]
[1] "double" "complex" "integer" "character" "logical"
[[2]]
[1] "data.frame" "vector" "factor" "list" "matrix"
There are several subtly different ways to call variables, observations and elements from data frames. Try them all and discuss with your team what they return. (Hint, use the function typeof())
iris[1]iris[[1]]iris$Speciesiris["Species"]iris[1,1]iris[,1]iris[1,]# The single brace [1] returns the first slice of the list, as another list. In this case it is the first column of the data frame.
head(iris[1])
## Sepal.Length
## 1 5.1
## 2 4.9
## 3 4.7
## 4 4.6
## 5 5.0
## 6 5.4
# The double brace [[1]] returns the contents of the list item. In this case it is the contents of the first column, a vector of type factor.
head(iris[[1]])
## [1] 5.1 4.9 4.7 4.6 5.0 5.4
# This example uses the $ character to address items by name. Species is a vector of type factor.
head(iris$Species)
## [1] setosa setosa setosa setosa setosa setosa
## Levels: setosa versicolor virginica
# A single brace ["Species"] instead of the index number with the column name will also return a list like in the first example
head(iris["Species"])
## Species
## 1 setosa
## 2 setosa
## 3 setosa
## 4 setosa
## 5 setosa
## 6 setosa
# First element of first row and first column. The returned element is an integer
iris[1,1]
## [1] 5.1
# First column. Returns a vector
iris[,1]
## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
## [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
## [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
## [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
## [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
## [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9
# First row. Returns a list with all the values in the first row.
iris[1,]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosaTake the list you created in 4 and coerce it into a data frame. Then change the names of the columns to “dataTypes” and “dataStructures”
df <- as.data.frame(data.sections)
colnames(df) <- c("dataTypes", "dataStructures")
print(df) dataTypes dataStructures
1 double data.frame
2 complex vector
3 integer factor
4 character list
5 logical matrix
Common ways to change column names
colnames()If you want to rename all the columns at once, this is the fastest method. You simply provide a vector of names that matches the number of columns.
# Create a dummy dataframe
df <- data.frame(V1 = 1:3, V2 = 4:6, V3 = 7:9)
# Rename all columns
colnames(df) <- c("ID", "Treatment", "Response")If you only want to change one specific column, you can use its index (position). This is great for small tables but risky for large ones if the column order changes.
# Change only the 2nd column
colnames(df)[2] <- "Condition"dplyr::rename()This is the preferred method for most researchers because it is readable and safe. You don’t need to know the index of the column, and you can pipe it into your analysis.
new_name = old_namelibrary(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
df <- df %>%
rename(Patient_ID = ID,
Dosage = Response)The “Backtick” Trick: Column Names with Spaces
Usually, R replaces spaces in column names with a dot (.) because spaces can break code. However, you can force R to accept them using Backticks (`).
Important Distinction: Notice the difference between Single Quotes (
') and Backticks (`).
Quotes (
' '): Tell R that something is Text.Backticks (
` `): Tell R that something is a Name that contains “illegal” characters (like spaces or starting with a number).
# This works because of the backticks!
colnames(df) <- c("Data Types", "Data Structures")
print(df) Data Types Data Structures <NA>
1 1 4 7
2 2 5 8
3 3 6 9
# To call this column later, you MUST use backticks:
df$`Data Types`[1] 1 2 3
Why we avoid spaces in Bioinformatics
While R can handle spaces, it is generally discouraged in professional pipelines for several reasons:
Tab Completion: If you type df$d... and hit Tab, RStudio can instantly find data_types. If there is a space, you have to manually type the backticks every single time.
Compatibility: If you export your data to a colleague using Python or a command-line tool like awk, spaces in column names can cause their scripts to crash.
The “Snake Case” Standard: Most researchers prefer snake_case (e.g., gene_id) or camelCase (e.g., geneId).