You can download the .Rmd file here. You can download the .doc file here.
In today’s lab we will focus on understanding and building functions. R is an object oriented programming language, you can create objects in R and assign them specific names. Functions are a type of object which means that the useR has the power to create custom functions, which are saved to names. Functions are a fundamental part of data analysis in R. They help with carrying out coding operations in a swift and precise manner. They help to make workflows more dynamic and customized. Once you know how to build your own functions, coding in R becomes easier and you save a lot of time when analyzing data! It’s one of those skills which demands tall investments and needs time to grow. However once you reach a certain level of growth the yields are sizable and worth all the investment.
Let’s talk a little more about why you would want to invest in this skill of function building?
Functions in R have three components:
body()
- the part inside the function
formals()
- consist of arguments which direct how you use the function
environment()
- location of the function
Let’s consider an example of a function.
The two arguments in this function are vector
and power_value
. The function takes inputs for these arguments and spits out the output based on the directions specified in the function. These arguments consist the \(formals\). Next we will need to specify how the inputs of the arguments will be manipulated. The part which specifies the tasks that the function does with the inputs is the \(body\) of the function. The location wherein the function is mapped is the \(environment\). We won’t go into the details of the environment in this lab.
Let’s run the following code to see how this function works.
my_func <- function(vector, power_value){
transform_vec <- (vector)^power_value
return(transform_vec)
}
Now let’s look at the components of the function.
#Peeking into the components of the function
formals(my_func)
body(my_func)
environment(my_func)
As you can see you can peek into the structure of the function in R. This allows for understanding the mechanics of how the data is being manipulated as we build the function.
#A vector x
x <- c(5, 9, 8, 10)
#Applying the function to the vector x
my_func(x, 2)
Let’s build a function together. You studied about z-scores scores last week in lab-2. Recall that they are also called \(standardized\) \(scores\). The formula to compute z-scores is \[ z = \frac{x_i - M}{s} \] Let’s build a function to compute z scores for a vector in R.
Step 1 - identify the input vector In the formula above, \(x_i\) will be the input vector. Let’s build a random input vector x
for the purpose of this example. Starting with a made-up input helps to picture and take decisions about the type of data your function will take. I highly recommend starting with this step, but your approach may differ and that’s totally fine.
x <- c(5, 9, 8, 10, 14, 18, 19, 24, 35)
Step 2 - compute values needed
In the formula above you need to compute mean and sd in order to compute the z-score. Let’s compute the mean and sd of the vector x. I will save the values in objects m and s
m = mean(x)
s = sd(x)
Now, you can compute the z-scores
z = ((x - m)/s)
At this point you can see that you have all the elements you need to build a function for computing z-scores.
Step 3 - Wrap the objects in a function At this stage you create the function object and name the function object.
z_scores <- function(x){
m = mean(x)
s = sd(x)
z = ((x - m)/s)
}
z_scores(x)
You can see R run the function but you cannot see the output. You need to tell R to return
the values in z
. This is how R knows which value(s) from the body to output.
z_scores <- function(x){
m = mean(x)
s = sd(x)
z = ((x - m)/s)
return(z)
}
z_scores(x)
You have a function ready!
You can add more layers to this function, like the error message I have added here. I am stopping the function from doing any further operations if the input vector is non-numeric.
z_scores <- function(x){
if(!is.numeric(x)) {stop("x is not numeric")} else
return(x)
m = mean(x)
s = sd(x)
z = ((x - m)/s)
return(z)
return(message)
}
#With numeric vector
z_scores(x = c(5, 9, 8, 10, 14, 18, 19, 24, 35))
#With character (non-numeric) vector
z_scores(x= c("apple", "banana"))
scale
function in the {base}
package can carry out the same task.
scale(x = c(5, 9, 8, 10, 14, 18, 19, 24, 35))
scale(x= c("apple", "banana"))
You can see that the output from the scale
function is a matrix. There are times when obtaining a data frame or a tibble format for your output may be more helpful (for e.g., when you are using {tidyverse}
. You will learn more about {tidyverse}
in the future). The z_scores
function, we just built, helps with this issue by returning the output as a numeric vector. When you write your functions you could also specify the formats of your output. It really does make life easier!
In this section we will talk about some of the practical aspects of writing functions. First, when you are analyzing data, you may come across instances which are ideal for building functions. It is helpful to identify them and recognize how functions can help your workflow. Secondly, writing functions is a skill, it takes time to understand and write good functions. It is also important to remember that a function that works better in one project may need modifications to be as effective in another. There is no one write answer to what a good function looks like.
Here are some suggestions to get you started on how to think about writing functions. Please note that this section does not cover all the information there is on this topic, you are encouraged to explore beyond this lab.
mean
, sd
and range
for a multiple columns in a data frame.na.rm
as TRUE
in certain cases and FALSE
in others.Cohen_d
, it is within 10 characters and leaves no ambiguity in terms of what it does. There can be instances where function names exceed 10 characters which aid clarity (see bootnet::bootnet_piecewiseIsing
). The same also stands for object names in the body of the function.This is a function that computes confidence intervals for the mean of a variable. You don’t need to know what a confidence interval is at this point (although your function may come in handy later in the term!).
mean_ci <- function(x, alpha) {
se = sd(x) / sqrt(length(x))
mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
}
#Your code here
Build a function to estimate the \(mean\) of a variable in R.
\[\mu = \frac{\Sigma_{i=1}^N(x_i)}{N}\]
random_var
which takes values from the 1 to 10. Hint: You can use cbind
to do this or look up the seq
function.col_sum
which contains the sum of all the values in random_var
.col_length
which contains the total number of values in vector random_var
. Hint: Look up length
function.m
which will contain the mean of all values in vector random_var
. Hint: use the objects col_sum
and col_length
to do this.col_sum
, col_length
and m
in a function with two formals random_var
and na.rm
. Don’t forget to set a default for na.rm
(remember it takes logical inputs).my_mean
.random_var
using the my_mean
function.mean
function in base R.#Your code here
Use the z_scores()
function that we built above for this mini hack. Compute z-scores of the following vector.
x <- c(5, 9, 8, 10, 14, 18, NA, 19, 24, 35, NA)
z_scores(x)
?mean
and ?sd
, use the na.rm
argument.#Your code here
Sara mentioned \(sum\) \(of\) \(squares\) \((SS)\) in the class last week. Write a function to compute \((SS)\). Build a random vector and test the function.
#Your code here
\[ CV = \frac{SD}{Mean} \]
x <- c(1:10)
y <- c(1:10, NA)
Hint: You can use the base R functions to compute \(sd\) and \(mean\). You also want to explore the na.rm
argument.
#Your code here