You can download the .Rmd file here.

Purpose

In today’s lab we will focus on understanding and building functions. Functions are a fundamental part of data analysis in R. They help with carrying out coding operations in a swift and precise manner. They help to make workflows more dynamic and customized. Once you know how to build your own functions, coding in R becomes easier and you save a lot of time when analyzing data! It’s one of those skills which demands tall investments and needs time to grow. However once you reach a certain level of growth the yields are sizable and worth all the investment.

Why use functions

  1. You don’t need to reinvent the wheel.
  2. Functions help to organize your thinking into descrete units/operations.
  3. Functions help in organizing your workflow. They reduce errors and help keep a track of complex tasks.

Why build your own functions

  1. Functions allow for developing and maintaining algorithms specific to your project.
  2. Functions support reproducibility. If you build a function and embed it in a R package, anyone who is able to download the package is likely to have access to the function.
  3. Functions help you better understand how the functions you use are operating, and how to optimize your usage.

Components of a function

Functions in R have three components:

  • body() - the part inside the function

  • formals() - consist of arguments which direct how you use the function

  • environment() - location of the function

Let’s consider an example of a function.

The two arguments in this function are vector and power_value. The function takes inputs for these arguments and spits out the output based on the directions specified in the function. These arguments consist the \(formals\). Next we will need to specify how the inputs of the arguments will be manipulated. The part which specifies the tasks that the function does with the inputs is the \(body\) of the function. The location wherein the function is mapped is the \(environment\). We won’t go into the details of the environment in this lab.

Let’s run the following code to see how this function works.

my_func <- function(vector, power_value){
        transform_vec <- (vector)^power_value
        return(transform_vec)
}

Now let’s look at the components of the function.

#Peeking into the components of the function
formals(my_func)
body(my_func)
environment(my_func)

As you can see you can peek into the structure of the function in R. This allows for understanding the mechanics of how the data is being manipulated as we build the function.

#A vector x 
x <- c(5, 9, 8, 10)

#Applying the function to the vector x
my_func(x, 2) 

Build an example function

You studied about z-scores scores last week in lab-2. Recall that they are also called \(standardized\) \(scores\). The formula to compute z-scores is \[ z = \frac{x_i - M}{s} \] Let’s build a function to compute z scores for a vector in R.

Step 1 - identify the input vector In the formula above, \(x_i\) will be the input vector. Let’s build a random input vector x for the purpose of this example. Starting with a made-up input helps to picture and take decisions about the type of data your function will take. I highly recommend starting with this step, but your approach may differ and that’s totally fine.

x <- c(5, 9, 8, 10, 14, 18, 19, 24, 35)

Step 2 - compute values needed
In the formula above you need to compute mean and sd in order to compute the z-score. Let’s compute the mean and sd of the vector x. I will save the values in objects m and s

m <- mean(x)
s <- sd(x)

Now, you can compute the z-scores

z <- ((x - m)/s)

At this point you can see that you have all the elements you need to build a function for computing z-scores.

Step 3 - Wrap the objects in a function At this stage you create the function object and name the function object.

z_scores <- function(x){
                 m <- mean(x)
                 s <- sd(x)
                 z <- ((x - m)/s)
}

z_scores(x)

Step 4 - Decide how you want to output the data.

z_scores <- function(x){
                 m = mean(x)
                 s = sd(x)
                 z = ((x - m)/s)
               return(z)
}

z_scores(x)

You have a working function!

Step 5 - Testing/Debugging Does the function give you the values you expect? Does the function handle all forms of input you are planning on giving it?

#Are these the values you expect?
z_scores(x)
#does it handle single values?
z_scores(2)
#does it handle NAs?
z_scores(c(1,4,5,NA))
#does it handle character vectors?
z_scores(c('cat', 'dog'))

Here is one way to handle non-numeric vectors:

z_scores <- function(x){
  
    if(!is.numeric(x)) {stop("x is not numeric")}
                 m = mean(x)
                 s = sd(x)
                 z = ((x - m)/s)
               return(z)
}

#With numeric vector
z_scores(x = c(5, 9, 8, 10, 14, 18, 19, 24, 35))

#With character (non-numeric) vector
z_scores(x= c("apple", "banana"))

scale function in the {base} package can carry out the same task.

scale(x = c(5, 9, 8, 10, 14, 18, 19, 24, 35))

scale(x= c("apple", "banana"))

You can see that the output from the scale function is a matrix. There are times when obtaining a data frame or a tibble format for your output may be more helpful (for e.g., when you are using {tidyverse}. You will learn more about {tidyverse} in the future). The z_scores function, we just built, helps with this issue by returning the output as a numeric vector. When you write your functions you could also specify the formats of your output. It really does make life easier!

Step 6 - Make sure your function is well documented

z_scores <- function(x){
  #takes a numeric vector and returns the z-scores
  #input: x: numeric vector
  #output: vector of z values
  
  #gives error message for non-numeric vectors
  if(!is.numeric(x)) {stop("x is not numeric")}
  
    #calculate z scores
    m = mean(x)
    s = sd(x)
    z = ((x - m)/s)
    return(z)
}

Suggestions for writing functions

In this section we will talk about some of the practical aspects of writing functions. First, when you are analyzing data, you may come across instances which are ideal for building functions. It is helpful to identify them and recognize how functions can help your workflow. Secondly, writing functions is a skill, it takes time to understand and write good functions. It is also important to remember that a function that works better in one project may need modifications to be as effective in another. There is no one write answer to what a good function looks like.
Here are some suggestions to get you started on how to think about writing functions. Please note that this section does not cover all the information there is on this topic, you are encouraged to explore beyond this lab.

When to create functions?

  1. When you find yourself repeating operations - e.g., you are copy pasting the same code a lot.
  2. You want to reuse a set of operations at another point in your workflow - for e.g., you need to use a function you built to create a plot in four different places in your workflow.
  3. When you want to carry out iterations or simulations - for e.g., you may want to find the mean of a sub sample (subset of the larger sample) and ask R to do this 100 times.
  4. To increase efficiency and readability.
  5. When you think this operation will be valuable to others.

What is a good function?

  1. One that does what it is exactly supposed to do - make sure you try the function on smaller vectors/values to test whether it is giving the desirable output.
  2. One that has object names which are appropriate and easy to identify what they stand for - for e.g., if you are building a function to compute Cohen’s d you might want to consider naming the function Cohen_d, it is within 10 characters and leaves no ambiguity in terms of what it does. There can be instances where function names exceed 10 characters which aid clarity (see bootnet::bootnet_piecewiseIsing). The same also stands for object names in the body of the function.
  3. One that makes clear what it is doing and is well annotated - it is good practice to add comments and annotate your code especially if you want people to use your function.
  4. One that is flexible - e.g., you may want to specify an argument na.rm as TRUE in certain cases and FALSE in others.

Programming tips

  1. Get into the habit of commenting as you go.
  2. Break it up into the individual operations.
  3. Think about what would be the most useful form of input and outputs.
  4. Your function might need a function.
  5. Always test it. Try to break it.

Let’s practice one more example

Create a function that multiplies 2 numbers together and divides by a 3rd number (if supplied).

test_function <- function(x, y, z=1) {
  x * y / z
}

#test
test_function(3, 5, 4)
test_function(3, 5)
test_function(3, 5, 0)

Minihacks

Minihack 1: Components of Function

This is a function that computes confidence intervals for the mean of a variable. You don’t need to know what a confidence interval is at this point (although your function may come in handy later in the term!).

mean_ci <- function(x, alpha) {
  se = sd(x) / sqrt(length(x))
  mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2))
}
  • What are the formals in this function?
  • What is the body of this function?
  • What is the environment of this function?
#formals
formals(mean_ci)
#body
body(mean_ci)
#environment
environment(mean_ci)

Minihack 2: Write a function (guided)

Build a function to estimate the \(mean\) of a variable in R.

\[\mu = \frac{\Sigma_{i=1}^N(x_i)}{N}\]

  • First generate a vector random_var which takes values from 1 to 10. Hint: You can use cbind to do this or look up the seq function.
  • Build an object vector col_sum which contains the sum of all the values in random_var.
  • Build an object col_length which contains the total number of values in vector random_var. Hint: Look up length function.
  • Build the vector m which will contain the mean of all values in vector random_var. Hint: use the objects col_sum and col_length to do this.
  • Now wrap col_sum, col_length and m in a function with one formal: random_var.
  • Name this function object as my_mean.
  • Compute the mean of random_var using the my_mean function.
  • Check if it matches the output from the mean function in base R.
my_mean <- function(random_var) {
  #Input: 
  #random_var: a vector of numeric values
  #Output:
  #mean
  #sum of values
  col_sum <- sum(random_var)
  #number of values in vector
  col_length <- length(random_var)
  #calculate mean
  m <- col_sum/col_length
  #return mean
  m
}
#create vector to test with
random_var <- c(1:10)
#test
my_mean(random_var)
mean(random_var)

Minihack 3: Write a function (hands off)

Sara mentioned \(sum\) \(of\) \(squares\) \((SS)\) in the class last week. Write a function to compute \((SS)\). Build a random vector and test the function.

library(tidyverse)
SS <- function(x) {
  #takes numeric vector, and returns sum of squares
  (x - mean(x))**2 %>% #get deviations and square
    sum() #get sum of squared deviations
}
#test 
potato_weights <- rnorm(10, 1, 1)
 
SS(potato_weights)

Minihack 4: Debugging

Use the z_scores() function that we built above for this mini hack. Compute z-scores of the following vector.

x <- c(5, 9, 8, 10, 14, 18, NA, 19, 24, 35, NA)
z_scores(x)
  • Does the function work? Why?
  • Try fixing the bug in this function. Modify the functions so that it ignores NAs, and computes z-scores for all other values in the vector. Hint - Look up ?mean and ?sd, use the na.rm argument.
  • BONUS: add an additonal formal, na.rm which handles NAs in an appropriate way if set to TRUE and gives an error message if set to FALSE and there are NAs present. Make sure to test that your function handles vectors with no NAs the same regardless of what na.rm is set to. Also, make sure to set a default value for na.rm.
z_scores <- function(x){
                 m = mean(x, na.rm = T)
                 s = sd(x, na.rm = T)
                 z = ((x - m)/s)
               return(z)
}
#test function
z_scores(x)

Minihack 5: Reading code and lexical scoping (not covered)

  • Before running it, what do you think this code will output?
x <- 1
strange_function <- function() {
  x
}
strange_function()
  • Before running it, what do you think this code will output?
x <- 1
strange_function <- function(x) {
  x
}
strange_function(2)
  • Before running it, what do you think this code will output?
x <- 1
strange_function <- function(x) {
  x <- 11
  x + 3
}
x <- 4
strange_function(5)
  • Before running it, what will the output be? What will the value of x be after it runs?
x <- 1
strange_function <- function(x) {
  x <- 11
  x + 3
  print(x)
}
x <- 4
x <- strange_function(x)
  • See here for further explanation.