You can download the .Rmd file here.
In today’s lab we will focus on understanding and building functions. Functions are a fundamental part of data analysis in R. They help with carrying out coding operations in a swift and precise manner. They help to make workflows more dynamic and customized. Once you know how to build your own functions, coding in R becomes easier and you save a lot of time when analyzing data! It’s one of those skills which demands tall investments and needs time to grow. However once you reach a certain level of growth the yields are sizable and worth all the investment.
Functions in R have three components:
body()
- the part inside the function
formals()
- consist of arguments which direct how
you use the function
environment()
- location of the function
Let’s consider an example of a function.
The two arguments in this function are vector
and
power_value
. The function takes inputs for these arguments
and spits out the output based on the directions specified in the
function. These arguments consist the \(formals\). Next we will need to specify how
the inputs of the arguments will be manipulated. The part which
specifies the tasks that the function does with the inputs is the \(body\) of the function. The location
wherein the function is mapped is the \(environment\). We won’t go into the details
of the environment in this lab.
Let’s run the following code to see how this function works.
my_func <- function(vector, power_value){
transform_vec <- (vector)^power_value
return(transform_vec)
}
Now let’s look at the components of the function.
#Peeking into the components of the function
formals(my_func)
body(my_func)
environment(my_func)
As you can see you can peek into the structure of the function in R. This allows for understanding the mechanics of how the data is being manipulated as we build the function.
#A vector x
x <- c(5, 9, 8, 10)
#Applying the function to the vector x
my_func(x, 2)
You studied about z-scores scores last week in lab-2. Recall that they are also called \(standardized\) \(scores\). The formula to compute z-scores is \[ z = \frac{x_i - M}{s} \] Let’s build a function to compute z scores for a vector in R.
Step 1 - identify the input vector
In the formula above, \(x_i\) will be
the input vector. Let’s build a random input vector x
for
the purpose of this example. Starting with a made-up input helps to
picture and take decisions about the type of data your function will
take. I highly recommend starting with this step, but your approach may
differ and that’s totally fine.
x <- c(5, 9, 8, 10, 14, 18, 19, 24, 35)
Step 2 - compute values
needed
In the formula above you need to compute mean and sd
in order to compute the z-score. Let’s compute the
mean and sd of the vector x. I will save the
values in objects m and s
#Your code here
Now, you can compute the z-scores
#Your code here
At this point you can see that you have all the elements you need to build a function for computing z-scores.
Step 3 - Wrap the objects in a function At this stage you create the function object and name the function object.
#Your code here
Step 4 - Decide how you want to output the data.
#Your code here
You have a working function!
Step 5 - Testing/Debugging Does the function give you the values you expect? Does the function handle all forms of input you are planning on giving it?
#Your code here
Here is one way to handle non-numeric vectors:
z_scores <- function(x){
if(!is.numeric(x)) {stop("x is not numeric")}
else {
m = mean(x)
s = sd(x)
z = ((x - m)/s)
return(z)
}
}
#With numeric vector
z_scores(x = c(5, 9, 8, 10, 14, 18, 19, 24, 35))
#With character (non-numeric) vector
z_scores(x= c("apple", "banana"))
scale
function in the {base}
package can
carry out the same task.
scale(x = c(5, 9, 8, 10, 14, 18, 19, 24, 35))
scale(x= c("apple", "banana"))
You can see that the output from the scale
function is a
matrix. There are times when obtaining a data frame or a tibble format
for your output may be more helpful (for e.g., when you are using
{tidyverse}
. You will learn more about
{tidyverse}
in the future). The z_scores
function, we just built, helps with this issue by returning the output
as a numeric vector. When you write your functions you could also
specify the formats of your output. It really does make life easier!
Step 6 - Make sure your function is well documented
z_scores <- function(x){
#takes a numeric vector and returns the z-scores
#input: x: numeric vector
#output: vector of z values
#gives error message for non-numeric vectors
if(!is.numeric(x)) {stop("x is not numeric")}
#calculate z scores
m = mean(x)
s = sd(x)
z = ((x - m)/s)
return(z)
}
In this section we will talk about some of the practical aspects of
writing functions. First, when you are analyzing data, you may come
across instances which are ideal for building functions. It is helpful
to identify them and recognize how functions can help your workflow.
Secondly, writing functions is a skill, it takes time to understand and
write good functions. It is also important to remember that a function
that works better in one project may need modifications to be as
effective in another. There is no one right answer to what a good
function looks like.
Here are some suggestions to get you started on how to think about
writing functions. Please note that this section does not cover all the
information there is on this topic, you are encouraged to explore beyond
this lab.
Cohen_d
, it is within 10 characters and leaves
no ambiguity in terms of what it does. There can be instances where
function names exceed 10 characters which aid clarity (see
bootnet::bootnet_piecewiseIsing
). The same also stands for
object names in the body of the function.na.rm
as TRUE
in certain cases and
FALSE
in others.Create a function that multiplies 2 numbers together and divides by a 3rd number (if supplied).
#Your code here
This is a function that computes confidence intervals for the mean of a variable. You don’t need to know what a confidence interval is at this point (although your function may come in handy later in the term!).
mean_ci <- function(x, alpha) {
se = sd(x) / sqrt(length(x))
return(mean(x) + se * qnorm(c(alpha / 2, 1 - alpha / 2)))
}
#Your code here
Build a function to estimate the \(mean\) of a variable in R.
\[\mu = \frac{\Sigma_{i=1}^N(x_i)}{N}\]
random_var
which takes values
from 1 to 10. Hint: You can use c()
to do this or
look up the seq
function.col_sum
which contains the sum
of all the values in random_var
.col_length
which contains the total
number of values in vector random_var
. Hint: Look
up length
function.m
which will contain the mean of all
values in vector random_var
. Hint: use the objects
col_sum
and col_length
to do this.col_sum
, col_length
and
m
in a function with one formal:
random_var
.my_mean
.random_var
using the
my_mean
function.mean
function
in base R.#Your code here
Sara mentioned \(sum\) \(of\) \(squares\) \((SS)\) in the class last week. Write a function to compute \((SS)\). Build a random vector and test the function.
#Your code here
Use the z_scores()
function that we built above for this
mini hack. Compute z-scores of the following vector.
x <- c(5, 9, 8, 10, 14, 18, NA, 19, 24, 35, NA)
#Your code here
?mean
and ?sd
, use the
na.rm
argument.na.rm
which handles NAs in an appropriate way if set to TRUE
and
gives an error message if set to FALSE
and there are NAs
present. Make sure to test that your function handles vectors with no
NAs the same regardless of what na.rm
is set to. Also, make
sure to set a default value for na.rm
.#Your code here
x <- 1
strange_function <- function() {
x
}
strange_function()
x <- 1
strange_function <- function(x) {
x
}
strange_function(2)
x <- 1
strange_function <- function(x) {
x <- 11
x + 3
}
x <- 4
strange_function(5)
x <- 1
strange_function <- function(x) {
x <- 11
x + 3
print(x)
}
x <- 4
x <- strange_function(x)