1. A brief introduction to “apply” in R
At any R Q&A site, you’ll frequently see an exchange like this one:
Q: How can I use a loop to […insert task here…] ?
A: Don’t. Use one of the apply functions.
So, what are these wondrous apply functions and how do they work? I
think the best way to figure out anything in R is to learn by
experimentation, using embarrassingly trivial data and functions.
If you fire up your R console, type “??apply” and scroll down to the
functions in the base package, you’ll see something like this:
1
2
3
4
5
6
7
base::apply Apply Functions Over Array Margins
base::by Apply a Function to a Data Frame Split by Factors
base::eapply Apply a Function Over Values in an Environment
base::lapply Apply a Function over a List or Vector
base::mapply Apply a Function to Multiple List or Vector Arguments
base::rapply Recursively Apply a Function to a List
base::tapply Apply a Function Over a Ragged Array
2. 1. apply
사용자 정의 함수를 행렬의 각 행이나 각 열에 적용할 수 있게 하는 함수
사용방법
apply(m, dimcode, f, fargs)
m : matrix
dimcode : 차원수, 1-행 2-열
f : 적용할 함수
fargs : 함수의 인자 - optional
3. Description: “Returns a vector or array or list of values obtained by
applying a function to margins of an array or matrix.”
we know about vectors/arrays and functions, but what are these
“margins”? Simple: either the rows (1), the columns (2) or both (1:2).
By “both”, we mean “apply the function to each individual value.” An
example:
Example
create a matrix of 10 rows x 2 columns
means of the rows
means of the columns
divide all values by 2
4. create a matrix of 10 rows x 2 columns
m <- matrix(c(1:10, 11:20), nrow = 10, ncol = 2)
means of the rows
apply(m, 1, mean)
means of the rows
apply(m, 2, mean)
divide all values by 2
apply(m, 1:2, function(x) x/2)
5. 2. by
tapply함수의 벡터 대신 객체를 사용하는 함수
사용방법
by(m, factor, f, fargs)
m : object
factor : 팩터요소
f : 적용할 함수
fargs : 함수의 인자 - optional
6. Description: “Function ‘by’ is an object-oriented wrapper for ‘tapply’
applied to data frames.”
The by function is a little more complex than that. Read a little
further and the documentation tells you that “a data frame is split
by row into data frames subsetted by the values of one or more
factors, and function ‘FUN’ is applied to each subset in turn.” So, we
use this one where factors are involved.
Example
Read iris data
get the mean of the first 4 variables, by species
7. Read the iris
attach(iris)
get the mean of the first 4 variables, by species
by(iris[, 1:4], Species, colMeans)
Essentially, by provides a way to split your data by factors and do
calculations on each subset. It returns an object of class “by” and
there are many, more complex ways to use it.
8. 3. lapply
특정함수를 리스트의 각 요소에 적용하고 결과값으로 리스트를
반환한다.
사용방법
lapply(list(), f, fargs)
list(): list
f : 적용할 함수
fargs : 함수의 인자 - optional
9. Description: “lapply returns a list of the same length as X, each
element of which is the result of applying FUN to the
corresponding element of X.”
That’s a nice, clear description which makes lapply one of the easier
apply functions to understand. A simple example:
Example
create a list with 2 elements
the mean of the values in each element
the sum of the values in each element
10. create a list with 2 elements
l <- list(a = 1:10, b = 11:20)
the mean of the values in each element
lapply(l, mean)
the sum of the values in each element
lapply(l, sum)
11. 4. sapply
특정 데이터 셋이 벡터나 matrix 형태라면 sapply 는 벡터나
matrix형태로 반환한다
사용방법
sapply(list(), f, fargs)
list(): list
f : 적용할 함수
fargs : 함수의 인자 - optional
12. Description: “sapply is a user-friendly version of lapply by default
returning a vector or matrix if appropriate.”
That simply means that if lapply would have returned a list with
elements $a and $b, sapply will return either a vector, with elements
[[‘a’]] and [[‘b’]], or a matrix with column names “a” and “b”.
Returning to our previous simple example:
Example
create a list with 2 elements
mean of values using sapply
what type of object was returned?
13. create a list with 2 elements
l <- list(a = 1:10, b = 11:20)
the mean of the values using sapply
l.mean <- sapply(l, mean)
what type of object was returned?
class(l.mean)
l.mean[[“a”]]
14. 5. mapply
mapply는 sapply가 멀티변수를 가질 경우 사용하는 함수이다
사용방법
mapply(f, l1$a,l1$b)
f : 적용할 함수
l1$a,l1$b : 멀티변수
15. Description: “mapply is a multivariate version of sapply. mapply
applies FUN to the first elements of each (…) argument, the second
elements, the third elements, and so on.”
The mapply documentation is full of quite complex examples, but
here’s a simple, silly one:
Example
create a list with 2 elements (l1 First)
create a list with 2 elements (l2 Second)
sum the corresponding elements of l1 and l2
16. create a list with 2 elements (l1 First)
l1 <- list(a = c(1:10), b = c(11:20))
create a list with 2 elements (l2 Second)
l2 <- list(c = c(21:30), d = c(31:40))
mapply(sum, l1$a, l1$b, l2$c, l2$d)
class(l.mean)
l.mean[[“a”]]
17. 6. tapply
팩터에 사용되는 apply함수 군이다.
사용방법
tapply(m, factor, f)
m : vector, matrix
factor : factor 변수
f : 적용할 함수
18. Description: “Apply a function to each cell of a ragged array, that is
to each (non-empty) group of values given by a unique
combination of the levels of certain factors.”
Woah there. That sounds complicated. Don’t panic though, it
becomes clearer when the required arguments are described. Usage
is “tapply(X, INDEX, FUN = NULL, …, simplify = TRUE)”, where X is
“an atomic object, typically a vector” and INDEX is “a list of factors,
each of same length as X”.
Example
Read iris data
mean petal length by species
19. Read the iris
attach(iris)
mean petal length by species
tapply(iris$Petal.Length, Species, mean)
Essentially, by provides a way to split your data by factors and do
calculations on each subset. It returns an object of class “by” and
there are many, more complex ways to use it.
20. The things to consider when choosing an
apply function are basically:
What class is my input data? – vector, matrix, data
frame
On which subsets of that data do I want the
function to act? – rows, columns, all values
What class will the function return? How is the
original data structure transformed?