You can download the .Rmd file here.
In today’s lab we will focus on understanding and building functions. Functions are a fundamental part of data analysis in R. They help with carrying out coding operations in a swift and precise manner. They help to make workflows more dynamic and customized. Once you know how to build your own functions, coding in R becomes easier and you save a lot of time when analyzing data! It’s one of those skills which demands tall investments and needs time to grow. However once you reach a certain level of growth the yields are sizable and worth all the investment.
Functions in R have three components:
body()
- the part inside the function
formals()
- consist of arguments which direct how you use the function
environment()
- location of the function
Let’s consider an example of a function.
The two arguments in this function are vector
and power_value
. The function takes inputs for these arguments and spits out the output based on the directions specified in the function. These arguments consist the \(formals\). Next we will need to specify how the inputs of the arguments will be manipulated. The part which specifies the tasks that the function does with the inputs is the \(body\) of the function. The location wherein the function is mapped is the \(environment\). We won’t go into the details of the environment in this lab.
Let’s run the following code to see how this function works.
my_func <- function(vector, power_value){
transform_vec <- (vector)^power_value
return(transform_vec)
}
Now let’s look at the components of the function.
#Peeking into the components of the function
formals(my_func)
body(my_func)
environment(my_func)
As you can see you can peek into the structure of the function in R. This allows for understanding the mechanics of how the data is being manipulated as we build the function.
#A vector x
x <- c(5, 9, 8, 10)
#Applying the function to the vector x
my_func(x, 2)
You studied about z-scores scores last week in lab-2. Recall that they are also called \(standardized\) \(scores\). The formula to compute z-scores is \[ z = \frac{x_i - M}{s} \] Let’s build a function to compute z scores for a vector in R.
Step 1 - identify the input vector In the formula above, \(x_i\) will be the input vector. Let’s build a random input vector x
for the purpose of this example. Starting with a made-up input helps to picture and take decisions about the type of data your function will take. I highly recommend starting with this step, but your approach may differ and that’s totally fine.
x <- c(5, 9, 8, 10, 14, 18, 19, 24, 35)
Step 2 - compute values needed
In the formula above you need to compute mean and sd in order to compute the z-score. Let’s compute the mean and sd of the vector x. I will save the values in objects m and s
m <- mean(x)
s <- sd(x)
Now, you can compute the z-scores
z <- ((x - m)/s)
At this point you can see that you have all the elements you need to build a function for computing z-scores.
Step 3 - Wrap the objects in a function At this stage you create the function object and name the function object.
z_scores <- function(x){
m <- mean(x)
s <- sd(x)
z <- ((x - m)/s)
}
z_scores(x)
Step 4 - Decide how you want to output the data.
z_scores <- function(x){
m = mean(x)
s = sd(x)
z = ((x - m)/s)
return(z)
}
z_scores(x)
You have a working function!
Step 5 - Testing/Debugging Does the function give you the values you expect? Does the function handle all forms of input you are planning on giving it?
#Are these the values you expect?
z_scores(x)
#does it handle single values?
z_scores(2)
#does it handle NAs?
z_scores(c(1,4,5,NA))
#does it handle character vectors?
z_scores(c('cat', 'dog'))
Here is one way to handle non-numeric vectors:
z_scores <- function(x){
if(!is.numeric(x)) {stop("x is not numeric")}
m = mean(x)
s = sd(x)
z = ((x - m)/s)
return(z)
}
#With numeric vector
z_scores(x = c(5, 9, 8, 10, 14, 18, 19, 24, 35))
#With character (non-numeric) vector
z_scores(x= c("apple", "banana"))
scale
function in the {base}
package can carry out the same task.
scale(x = c(5, 9, 8, 10, 14, 18, 19, 24, 35))
scale(x= c("apple", "banana"))
You can see that the output from the scale
function is a matrix. There are times when obtaining a data frame or a tibble format for your output may be more helpful (for e.g., when you are using {tidyverse}
. You will learn more about {tidyverse}
in the future). The z_scores
function, we just built, helps with this issue by returning the output as a numeric vector. When you write your functions you could also specify the formats of your output. It really does make life easier!
Step 6 - Make sure your function is well documented
z_scores <- function(x){
#takes a numeric vector and returns the z-scores
#input: x: numeric vector
#output: vector of z values
#gives error message for non-numeric vectors
if(!is.numeric(x)) {stop("x is not numeric")}
#calculate z scores
m = mean(x)
s = sd(x)
z = ((x - m)/s)
return(z)
}
In this section we will talk about some of the practical aspects of writing functions. First, when you are analyzing data, you may come across instances which are ideal for building functions. It is helpful to identify them and recognize how functions can help your workflow. Secondly, writing functions is a skill, it takes time to understand and write good functions. It is also important to remember that a function that works better in one project may need modifications to be as effective in another. There is no one write answer to what a good function looks like.
Here are some suggestions to get you started on how to think about writing functions. Please note that this section does not cover all the information there is on this topic, you are encouraged to explore beyond this lab.
Cohen_d
, it is within 10 characters and leaves no ambiguity in terms of what it does. There can be instances where function names exceed 10 characters which aid clarity (see bootnet::bootnet_piecewiseIsing
). The same also stands for object names in the body of the function.na.rm
as TRUE
in certain cases and FALSE
in others.Create a function that multiplies 2 numbers together and divides by a 3rd number (if supplied).
test_function <- function(x, y, z=1) {
x * y / z
}
#test
test_function(3, 5, 4)
test_function(3, 5)
test_function(3, 5, 0)
This is a function that computes confidence intervals for the mean of a variable. You don’t need to know what a confidence interval is at this point (although your function may come in handy later in the term!).
mean_ci <- function(x, alpha) {
se = sd(x) / sqrt(length(x))
mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
}
#formals
formals(mean_ci)
#body
body(mean_ci)
#environment
environment(mean_ci)
Build a function to estimate the \(mean\) of a variable in R.
\[\mu = \frac{\Sigma_{i=1}^N(x_i)}{N}\]
random_var
which takes values from 1 to 10. Hint: You can use cbind
to do this or look up the seq
function.col_sum
which contains the sum of all the values in random_var
.col_length
which contains the total number of values in vector random_var
. Hint: Look up length
function.m
which will contain the mean of all values in vector random_var
. Hint: use the objects col_sum
and col_length
to do this.col_sum
, col_length
and m
in a function with one formal: random_var
.my_mean
.random_var
using the my_mean
function.mean
function in base R.my_mean <- function(random_var) {
#Input:
#random_var: a vector of numeric values
#Output:
#mean
#sum of values
col_sum <- sum(random_var)
#number of values in vector
col_length <- length(random_var)
#calculate mean
m <- col_sum/col_length
#return mean
m
}
#create vector to test with
random_var <- c(1:10)
#test
my_mean(random_var)
mean(random_var)
Sara mentioned \(sum\) \(of\) \(squares\) \((SS)\) in the class last week. Write a function to compute \((SS)\). Build a random vector and test the function.
library(tidyverse)
SS <- function(x) {
#takes numeric vector, and returns sum of squares
(x - mean(x))**2 %>% #get deviations and square
sum() #get sum of squared deviations
}
#test
potato_weights <- rnorm(10, 1, 1)
SS(potato_weights)
Use the z_scores()
function that we built above for this mini hack. Compute z-scores of the following vector.
x <- c(5, 9, 8, 10, 14, 18, NA, 19, 24, 35, NA)
z_scores(x)
?mean
and ?sd
, use the na.rm
argument.na.rm
which handles NAs in an appropriate way if set to TRUE
and gives an error message if set to FALSE
and there are NAs present. Make sure to test that your function handles vectors with no NAs the same regardless of what na.rm
is set to. Also, make sure to set a default value for na.rm
.z_scores <- function(x){
m = mean(x, na.rm = T)
s = sd(x, na.rm = T)
z = ((x - m)/s)
return(z)
}
#test function
z_scores(x)
x <- 1
strange_function <- function() {
x
}
strange_function()
x <- 1
strange_function <- function(x) {
x
}
strange_function(2)
x <- 1
strange_function <- function(x) {
x <- 11
x + 3
}
x <- 4
strange_function(5)
x <- 1
strange_function <- function(x) {
x <- 11
x + 3
print(x)
}
x <- 4
x <- strange_function(x)