What is it, and what can it do for AEC?
R is a free, open source program that is becoming increasingly popular in the geospatial industry. R used to be the preserve of statisticians, but its open and flexible nature has allowed it to grow to cover a wide range of disciplines and user bases.
R was originally designed as a statistics program and is still used as one by many academics and professionals. However, one of the key elements of R is its flexibility.
A major concept within R is that of libraries or packages. These allow additional code and functions to be made available that add different analysis and features. These can be loaded as required and are very easy to write and contribute to.
As a result, the available R libraries cover almost every potential statistical analysis possible, and even some niche analysis. This high level of flexibility has also enabled R to expand its capabilities, including working with spatial data. A wide range of spatial analysis is now possible, and any analysis you want to do in ArcGIS or QGIS can be done in R.
As well as spatial data, R works well with large amounts of data; running stats on 30 differ ent measurements from 200 trial sites is not unusual!
As a result, the number and range of R’s libraries has grown dramatically, and now R can be used for a range of applications, including GIS. R can link with many of the popular GIS file formats, primarily using the GDAL (Geospatial Data Abstraction Library). This allows it to read and write shape files, geopackages, GML, etc.
R is also a flexible GIS program widely used in academia and by the public and private sectors. There are many conversion courses available (from ArcGIS to R or from QGIS to R), and it is a key tool in many GIS toolsets.
R & RStudio Benefits
R is also a relatively easy-to-learn scripting language; within half a day nearly anyone can become familiar with it and use it to increase efficiency and automate some of their repetitive workload.
For example, if you have a regular set of data analysis and outputs that need to be run every three months, once you have set up the process in R as a script, it is easy to re-run the script with the new data set. This is a time-saver, and I use it in my work for regularly creating sets of maps that need to be updated with a new quarter’s data.
Nearly everyone who uses R actually uses RStudio, which is an IDE (integrated development environment) for R. RStudio provides an interface to work with R and makes using R much easier.
R is primarily a command-line-driven interface. This is quite a different working environment from a traditional GUI (graphical user interface) like Word or Excel, so it has a bit of a learning curve for most users. However it has many of the advantages of being a scripting language, including reproducibility and shareability.
If you performed a series of processes in ArcGIS, you would need to record exactly what you did each time, including the parameters you used in order to replicate the process. Whereas with a scripting interface, you are given the series of commands you ran, and it is easy to rerun the code with the click of a button.
You could replicate a series of commands in ArcGIS using Model Builder, and some people find the graphical interface easy to use. However, it often struggles with complex models and can involve a lot of work to setup for a very simple model.
It is also easy to share these scripts with colleagues, or even collaborate on the same scripts with a version control system (e.g. Git/GitHub). For many users it provides an introduction to data science. Of all the potential programming languages, it is one of the easiest to learn; it’s similar to Python.
FOSS: Free and Open Source Software
R is an open source program, which means it is free to anyone to download and use, even for commercial work. Therefore, a much wider group can access it compared with commercial software.
Take a look at my recent article on QGIS at xyHt.com to find out more about open source software and why our contributions are key to its development.
The openness of the code, along with people’s willingness to share their code and the fact that there are a wide variety of libraries for many different types of analysis, means that we often don’t actually need to write much code; we can just combine the existing building blocks and examples.
Recently a group of R libraries has been created around the area of data science together called the tidyverse. These libraries all share a common design philosophy, grammar, and data structures and allow a range of data-science manipulation.
There is also a newly released tidyverse-adjacent package for spatial data called SF, or simple features. It uses many of the tidyverse’s underlying principles and updates the previously released SP spatial R library.
The Future in R
Overall R as a package has an incredibly wide range of uses. I recommend learning R to anyone new to the field and anyone interested in the increasingly large data science and spatial data science angle of AEC. R is a flexible, open source tool that is a valuable addition to anyone’s toolbox.
Also, remember that the key bit of open source software is contribution!