Join: Download file for loop R rbind
Download file for loop R rbind | 824 |
Download file for loop R rbind | 298 |
Download file for loop R rbind | 661 |
Learning R
Next LessonPrevious Lesson
baseR-V2016.2 - Data Management and Manipulation using R
Tested on R versions 3.0.X through 3.3.1
Last update: 15 August 2016
Objective:
Construct loops for repeated operations of R statements
Some Learning Objectives:
I must repeat the same R statements on objects that change in an orderly fashion - is this appropriate for a loop?
How do I start and stop a loop?
Can I write code for nested loops?
Some Background
Looping involves control structures (or conditionals per Module 4.5) and one or more statements that are repeated during each loop. The number of times a loop occurs (n) is determined by start and stop conditions. The benefit of the loop is to repeat statements without having to change statement parameters. The most obvious case for a loop is when output from a previous loop is used in a subsequent loop.
Generally speaking, loops are not as efficient as functions (see Module 4.9) when applied to vectors unless the number of iterations (loops) and statements is small and fixed. In many cases the difference between waiting for a 3 min loop operation versus 2.5 min for a vectorized operation is trivial.
However, 1000 repeats at 0.5 min difference between a loop and vectorized function is an additional 8+ hours or so. A general Rule-of-Thumb to follow is that an order of magnitude difference in time is worth pursing; less than that may not be worth the time investment to fully optimize the code.
There are some coding considerations that can increase the efficiency of loops. One of these is not to “grow” a data object during the loop. “Growing” means appending, through a , , or similar call, loop output onto an existing data object. It is better to create an outside “empty” data object and fill that with output from the loop.
In addition, use matrices rather dataframes. Matrices are more efficient. You can always coerce a matrix to a dataframe on conclusion of the loop (Module 3.6), if you prefer to work with dataframes rather than matrices
Nonetheless, loops serve a useful purpose in R, and often times they provide a logic framework for repeated operations that can then be considered for more optimized operations. My personal, bottom line is that if the time difference is not huge, build a loop and perform the analysis.
After all, any free time waiting can always be spent drinking coffee.
Some Initialization Before We Proceed …
Data from Exercise #6 (objects f1, m1, m2 ,m3, m4, t1, and w1) were saved as . Some of these objects will be needed, so load them first into your workspace.
If the objects are not there, or you did not save an from Exercise #6, you will need to return to Module 3.4, Exercise #6, and re-import the data before proceeding further.
Some Free Advice on Looping
First consider whether a loop is appropriate. Answering “yes” to the questions posed below is a good start.
- Will a set of one or more R statements be repeatedly applied to one or more data objects?
Next, the loop construction itself requires a logical approach:
- Start with a bulleted approach to loop operations, outlining what is a logical outcome of an operation on a data object
- Convert these operations to R statement(s) and make sure each works
- Consider how to configure the index (start and stop as above), as well as determining what the actual values of the index will be?
- How will the index increment?
- Is the index numeric? A character string? Read from a list?
- Last, test the loop with a fixed index value
It is also a good strategy to determine a time stamp on the length of a loop. Two options include and . Each R statement in your developing loop can be nested inside , and R will return the time elapsed for each statement.
If you are simply interested in the total time required to run numerous statements, run at the start and end of a test loop. Either way you will obtain an idea of how long your proposed loop process will require.
After that you can balance your personal need for efficiency versus just getting the analysis completed.
Looping Control Structures - The
The control structure builds a loop that repeats statements for a specified number of iterations. The structure is: . Any single R statement or function, or multiple statements, can be placed inside the (curley brackets).
One of simpler structures is: , where i is the counter, and thus i = 1 is the Start, and i = 3 is the Stop. Numerous alternatives, especially lists, exist for the (StartStop) structure.
Consider the following illustrative (although nonsensical) loop. It starts with an outside data object x1 which consists of 5 numbers. These numbers are to be squared. Each iteration of the loop cause the index counter i to increase from i = 1 until it reaches i = 5, at which time the loop stops. Note that 5 is exactly how many elements are in x1.
At each iteration of i, the loop statement takes the ith observation in x1 and squares it. Output is directed to a new object x2, which we created as an empty object, and should result in a vector of values [1, 4, 9, 16, 25].
Assume you did not know how many elements were in x1. You could modify the control structure as:
where the stop is now determined by another function, , which in this case is equal to five.
Let’s consider a more realistic loop, one that imports into your workspace a series of external data structures, such all your files. Here, your steps are to:
- Determine the number and names of external .csv files for import;
- Import into your workspace using ; and assign them names that reflect the actual names of each .csv file.
This can all be accomplished using a command we saw in Module 4.2, the .
Build the loop:
The loop operated by setting the index i = 1 as the start, and it continued until it had reached the value determined by , which was 17. The first part of substringed characters from the files object, added XX, and pasted that together. The second part of used to import each external .csv file, which was then assigned the name created in the first part of . The result???? Rather than writing 17 statements, a loop was used to get the data into your workspace.
Most loops are not as simple as these two loops, but the process for creating a loop that performs repeated operations through a set of R statements is the same. It is just the control structure that varies.
Looping Control Structures - The
Like , operates from a Start until it reaches a Stop condition. The basic syntax is . The start condition for the loop is usually set outside the loop. The loop requires a counter to index the stop condition.
As a quick exercise on your own, use the structure to mimic the importation of the .csv files as we did with the loop. Use “YY” instead of “XX” in the workspace names this time.
Looping Control Structures - The
The has slightly different syntax than or . All operations occur inside , including the Stop condition.
As with , the Start condition is typically set outside the loop. Once the Stop condition met, the R call stops the loop from continuing. is the analog of the control structure in the and loops.
WARNING!!
The call can lead to infinite loops. Think carefully about the loop logic, especially the Stop condition, before using. Personally, I avoid using .
Again, use the structure to mimic the importation of the .csv files as we did with the loop. Add “ZZ” to the workspace names this time.
Controlling Output of Looping Operations
Output from a loop can be directed to the console, as a named object in workspace, or to an external file. The within loop sends output to the console, where LoopOperation is the R statement. There is one output per loop iteration.
You can also “build” a data object using . Here, Object is the workspace object being constructed during loop, and LoopOperation is the R statement(s) that are generating new output. Any possible R object class, e.g., data.frame, matrix, can built in this manner.
Last, the group of functions can be used to write to external file, where Object is the workspace object and FileName is the name of the external file.
Note that x1 has values (1, 2, 3, 4, 5) while both of the empty vectors - d1, and d2 - return NULL indicating no elements in the objects.
Note values of v1 - 1, 4, 9, 16, 25 - were written out to the console.
The remainder of the output, the objects d1 and d2, as well as the external file loopdat.csv, were successfully created, too.
Summary of Module 4.8 Functions
Basic calls related to sorting, ordering, and ranking data objects are:
- => Repeat statements for specified iterations
- => Repeat statements until stop criteria is reached
- => Repeat statements until break is called
- => Sends output to console from inside loop
Exercise #17
Data for this exercise are in: ../baseR-V2016.2/data/exercise_dat.
Import the dataset bearclawpoppy.csv from the data directory. This dataset consists of presence-absence locations of the bearclaw poppy, a rare plant of the Mojave desert, and associated environmental covariates.
Write a loop that:
- Feeds multiple statistical functions (i.e., mean, sd, length) to;
- Three topographic variables (elev, slope, aspect);
- By presab in the data bearclawpoppy data; and
- Exports these results as 3 separate R objects in your workspace, one for elev, slope, and aspect, respectively.
Write a loop to read in the 4 m1.csv, … , m4.csv datasets. Some Challenges:
- How to change the m1.cs, m2.csv, m3.csv and m4.csv (i.e., filename) given there is only a single in the loop.
- How to ensure each new object is assigned a unique name (e.g., m1, … ,m4)?
HINTS for both exercises:
- Think lists and indexing within a list
- Think pastes to create new character strings
- Think about ways new objects can be assigned
0 thoughts to “Download file for loop R rbind”