Tag Archives: GeoRange

GeoRange: an R package for calculating geographic range

One thing that I wasn’t expecting going into grad school was the amount of coding I would do. I took an Intro to Programming in Java during my 2nd year in undergraduate, which I enjoyed, but otherwise never thought much about it. That changed during my master’s work when I had to write some of my own code in the R Programming Language for some specialized tests. I’ve since taken a course on data analysis in R and another focusing on Python and written A LOT of R code for research. Even though I don’t consider myself any kind of coding expert in R I’ve gotten to the point where I think other people might actually find some of the functions I’ve written to be useful. So, I’ve written and submitted to CRAN a package dubbed GeoRange which should be available for download shortly if it isn’t already.

 

Quick thank you to Dave Bapst for his advice and encouragement in publishing R packages. The whole process was made much easier by this online tutorial.

 

GeoRange is for calculating and analyzing six different methods of geographic range from point occurrence data (i.e. latitude and longitude). It was born from my interest in geographic range as it relates to extinction risk. Very quickly, all else being equal a species that is more widespread across the Earth is less likely to be wiped out by stochastic events than one with a small range (Jablonski 2005). For example, a species confined to a single island in the Caribbean might be killed by a single hurricane season whereas it’s nearly impossible to wipe out all individuals of a species that occurs across the Atlantic. Pretty much all the geographic range measures in the package can be done (in some cases more efficiently) within ArcGIS but I didn’t do use it because I am not really a fan of ArcGIS to put it mildly. I can use it but the system always seems very glitchy and opaque for my tastes and data analysis can be a hassle. I was going to end up doing some analyses in R anyway so I figured I might as well do everything in R.

 

The actual measures of geographic range include the convex hull area, maximum pairwise distance, latitudinal range, longitudinal range, X x X degree cell count, and minimum spanning tree distance. The first five are fairly standard measures that are commonly used in extinction analyses but the minimum spanning tree (MST) may be unfamiliar to people, even those that study extinction. Essentially, the MST finds the most cost-effective way to connect all points without ever creating a loop, a problem similar to the Traveling Salesman Problem. Originally the MST was used to find the most efficient ways to lay down power-lines with the cost between points corresponding to the cost of building. In terms of geographic range the cost is the great circle distance between points and thus the MST represents the minimum distance a species must have traveled to have reached all points. That might include crossing impassable terrain and is unlikely to represent the actual path or distance traveled but it still seems to be an excellent correlate of extinction risk, especially after accounting for sampling (Boyle et al. 2017). Not sure why this is yet except that it better captures other factors, like abundance and fragmentation, that are associated with extinction risk.

MST&CH_100pts_UShape

Figure 1. Horseshoe-shaped distribution (thick black outline) with 100 random points generated. Showing the minimum spanning tree (thin black lines) and convex hull (blue outline) showing the stark difference in methods for certain shapes.

            For analyses of multiple taxa GeoRange is set up to work with capture matrices and can work directly with data from the Paleobiology Database via the downloadPBD function in the velociraptr package.

There are some known limitations with this package that I’m looking to fix in some future updates. A major issue is that calculating the MST takes a long time for more than 1000 points. The PlotMST function doesn’t account for points connecting around the prime meridian so that creates off lines that jump across the plot. The CellCount function isn’t equal area cells, so high latitude cells are stretched compared to equitorial ones and similarly the random point generation functions RandRec and RandHorseShoe don’t account for the stretching of lat/long area with latitude. There are lots of little tweaks to increase user options and expand functionality but for now I’m happy to get some feedback on the work and keep up my coding skills.

References

Boyle, J. 2017. GeoRange: Calculating Geographic Range from Occurrence Data. R Package version 0.1.0. https://CRAN.R-project.org/package=GeoRange

Jablonksi, D. 2005. Mass extinctions and macroevolution. Paleobiology 31:192-210.