Functions are a very important feature of most/all programming languages. We have already seen and used a series of functions such as Show
Instead of only using existing functions, we can write ourselves functions to do specific things for us. But why should we write functions? The most important reasons are:
Writing functions is the key feature to evolve from a ‘basic user’ to a ‘developer’. Functions in RAs everything else in R, functions are also first class objects (like vectors or matrices) and can be used in the same way. This allows one to pass functions as input arguments to other function which is frequently used and an important feature in R. Example: To prove that we can work with functions
like any other object in R, let us assign an existing function (
Functions consist of three key elements:
All three are optional; some functions have no input arguments, others have no explicit return, and we can even write functions without instructions - which are absolutely useless, as they … just don’t do anything. Typically functions have at least input arguments and instructions, and most will also explicitly return a result, or at least an indication that the function successfully executed the instructions (we’ll come back to returns later). Functions can also be nested (a function calls another function as part of the instructions) and can be called recursively (one function may call itself several times). Functions are most often side effect free. That means that the functions do not change anything outside the function itself. They simply take over the input arguments as specified, go trough the instructions, and return the result back to the line where the function has been called. However, functions can have side effects. R uses something called “lexical scoping” which allows the function to access, delete, and modify objects which have not been explicitly used as argument to the function. This can be somewhat confusing and should be avoided (especially as a novice). We will come back to that at the end of this chapter. When Should I Use Functions?
Functions in real lifeWe all come across function-like situations in our daily life multiple times a day. One illustrative example are backing recipes. Imagine you are cooking some brownies: Classical recipes are set up as follows:
We can even find more analogies on this screenshot:
Every time we call this “function” (use this recipe) with this very specific name (Cocoa Brownies) from this specific package (or site, the food network) using the inputs (ingredients) as specified, we will always get the same result. And that’s what functions do: perform specific tasks in a well-defined way, with a very specific return/result. And they can be reused over and over again, if needed. Illustrative exampleBefore learning how to write custom functions, let’s motivate functions once again. Let us assume we have to calculate the standard deviation for a series of numeric vectors. As most likely well known from math, the standard deviation is defined as: \(\text{sd}(x) = \sqrt{\frac{\sum_{i=1}^N (x_i - \bar{x})^2}{N - 1}}\) Using a bit of math and some of the vector functions from the
vectors chapter we can calculate the standard deviation of a vector
In words: take the square root ( We now have to calculate the standard deviation for three different vectors called What we do: We copy the code from above (for the standard deviation), insert it three times, adjust the names of the objects, and that’s it.
Even if the equation for the standard deviation is relatively simple you can already see that the code is quickly getting complex and prone to errors! Question: There is a bug in the code above! Have you noticed it? Solution hidden in the ‘practical exercise’ below. Exercise 6.1 What have we done: We wrote the command for the equation of the standard deviation once for
We tested this command and everything looked good. Thus, we copied the line twice for
This happens very easily and such bugs are often very hard to find, or will not be found at all (or after you published all your results, which may be all wrong due to such errors). Take home message: The copy & paste strategy is not a good option and should be avoided! Rather than doing this spaghetti-style coding we now use functions. Below you can find a small function (we will talk about the individual elements in a minute;
Declaring functions) which does the very same – it has one input parameter
Once we have the function, we can test if the function works as expected and then use it to do the same calculations again (as above). Note: If you would like to try it yourself we must execute the function definition above in the R console before we can use the function.
I think you can see that this code chunk looks much cleaner and that we have avoided the mistake we made above. The code does not only look cleaner, it is much easier to read, easier to maintain, and (as we have tested our function) we know that the results are correct. An additional advantage: we can reuse the function again for other tasks or projects. Calling functionsA function call consists of the name of the function and a (possibly empty) argument list in round brackets ( We have already seen a series of such function calls in the previous chapters with and without input arguments such as:
This is the same for all functions, even custom function written by ourselves. Note: in case you call a function which does not exist, R will throw an error and tells you that it could not find a function called like this. If so, check the name of the function you are calling (typo?).
Naming functionsFunctions can basically get any (valid) name. However, you may overwrite existing functions if the function name already exists.
An example of co-existence of a vector called
Even if this works try to
avoid such constructs as (even in this simple example) it is somehow confusing to understand what Declaring functionsLet us begin with an empty function. All parts of a function (input arguments, instructions, and output) are “optional”. If we don’t declare all three, that is what we will end up with:
Basic elements:
Inspect the object: As all objects in R we can also inspect our new object
Functions are of class function, the type (closure) simply indicates a function. Inspect the return value: Something which is a bit special in R: All functions have a return value. But haven’t we just learned that this is also optional? To be precise:
explicit returns are optional. But even if we have no explicit return, a function in R always returns something. This return can be invisible and/or empty, indicated by the Our function
… we get a The NULL value in RThe
The message behind the image: We can still work with a numerical zero (e.g., \(0 + 10 - 5 = 5\)), while a NULL value cannot be used for anything useful, not even in an emergency situation as in the picture above. Functions cat() and paste()In the following sections we will use two new functions called
We will, for now, only use the basics of these two functions to create some nice output and will come back to Concatenate and printThe function This can be used to show a simple character string, combine multiple characters, or combine elements of
different types to create easy-to-read output and information. Note: By default,
Note that And what does it return? The one and only purpose of
Concatenate stringsThe other function we will use is Instead of immediately showing the result on the console, this string will be returned such that we can store it on an object and use later. E.g., we can use the resulting string as a nice title for a plot. An example:
We create the very same character string as above, but now store the result on our new object
Basic functionsLet us start to write some more useful functions than the one in the section Declaring functions. Below you will find three functions (A - D) with increasing “complexity” to show the different parts of a function. Function A
As the
function has no input arguments, nothing has to be declared between the round brackets ( Once called, the instructions are executed and
By default R returns the ‘last thing returned inside the
instructions’ which in this case is simply In practice, we shall always define explicit returns in each and every function; we will come back to this in more detail later on. Function B
As shown in the instructions we have to adjust our function to have one input argument named
The difference to “Function A”: We now have one input argument to control the behaviour of the function. As there is no default (we’ll come back to that later) this is a mandatory argument. If we do not specify it, we will run into an error as the function expects that we do hand over this argument.
Again, as we have no explicit return, the function will return the last thing returned internally, which is (again) the
Exercise 6.2 Non-character input
Solution. As you will see, the function still works – even if the result might be a bit strange ( (1) Integer as input
(2) Logical value as input
(3) Character vector as input
This function is not very specific. In reality, we might extend the function and check what is specified on the input argument This is called a ‘sanity check’ (input check) which we will revisit at a later time. Function C
Let us declare a new function which we will call
The new function will now run completely silent (no information shown on console). Instead we get the resulting character
string (
Quick detour: |