Macroecology playground (1) – Bird species richness in a nutshell

Ahh, Macroecology. The study of ecological patterns and processes on big scales.  Questions like “what factors determine distribution and diversity of all life on earth?” have troubled scientists since A.v.Humboldt and Wallace times. At the University of Copenhagen a whole research center has been dedicated to this specific field and macro-ecological studies are more and more present in prestigious journals like Nature and Science.  Previous studies at the center have found skewed distributions of bird richness with a specific bias towards the mountains (Jetz & Rahbek, 2002, Rahbek et al., 2007). In this blog post i am going to play a bit around with some data from Rahbek et al. (2007). The analysis and the graphs are by no means sufficient (and even violate many model assumptions like homoscedasticity, normality and data independence) and are therefore more of exploratory nature 😉 The post will show you how to build a raster stack of geographical data and how to use the data in some very basic models.

It was recommended to me to use the freely available SAM software for the analysis but although the program is really nice and fast it isn’t suitable enough for me as you can not modify specific model parameters or graphical outputs. And as a self-declared R junkie i refuse to work with “click-compute-result” tools 😉

So here is how the head of SAM data file  (“data.sam”)  looks like (i won’t share it, so please generate your own data).

sc_samAs you can see the .sam file is technically just a tabulator separated table with the coordinates for a gridcell (1° gridcell on a latitude-longitude projection) and all response and predictor values for this cell. To get this data into R we are gonna use the raster package to generate a so called raster stack for our analysis. This is how i did it

# Load libraries
# Create Data from SAM
data <- read.delim(file="data.sam",header=T,sep="\t",dec=".") # read in a data.frame
coordinates(data) <- ~Longitude+Latitude # Convert to a SpatialPointsDataframe
cs <- "+proj=longlat +datum=WGS84 +no_defs" # define the correct projection (long-lat)
gridded(data) <- T # Make a SpatialPixelsDataframe
proj4string(data)  <- CRS(cs) # set the defined CRS

# Create Raster layer stack
s <- stack()
for(n in names(data)){
d <- data.frame(coordinates(data),data[,n])
ras <- rasterFromXYZ(xyz=d,digits=10,crs=CRS(cs))
s <- addLayer(s,ras)
# Now you can query and plot the raster layers from the stack
South American Bird species Richness. Grain Size: 1°

South American Bird species Richness. Grain Size: 1°

You wanna do some modeling or extract data? Here you go. First we make a subset of some of our predictors from the raster stack and then fit ordinary least squares multiple regression models to our data to see how much variance can be explained. Note that linear regressions are not the proper techniques for this kind of analysis (degrees of freedom to high due to spatial autocorrelation, violation of assumptions mentioned before), but its still useful for explanatory purposes.

# Extract some predictors from the raster Stack
predictors <- subset(s,c(7,8,10))
>  "NDVI" "Topographical.Range" "Annual.Mean.Temperature"
# Now extract the data from both the bird richness layer and the predictors
birds <- getValues(s$Birds.richness)
val <-

# Do the multiple regression
fit <- lm(birds~.,data=val)
>                          Estimate Std. Error t value Pr(>|t|)
(Intercept)             215.675282  15.837493   13.62   <2e-16 ***
NDVI                    -34.541242   1.245769  -27.73   <2e-16 ***
Topographical.Range       0.056458   0.002452   23.03   <2e-16 ***
Annual.Mean.Temperature   0.940664   0.054747   17.18   <2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 81.86 on 1525 degrees of freedom
(1461 observations deleted due to missingness)
Multiple R-squared:  0.6931,    Adjusted R-squared:  0.6925
F-statistic:  1148 on 3 and 1525 DF,  p-value: < 2.2e-16

Ignore the p-values and just focus on the adjusted r² value. As you can see we are able to explain nearly 70% of the variance with this simple model. So how do our residuals and the predicted values look like? For that we have to create analogous raster layers containing both the predicted and the residual values. Then we plot all species raster layers again using the spplot function from the package sp (automatically loaded with “raster”)

# Estimates prediction
rval <- getValues(s$Birds.richness) # Create new values
rval[as.numeric(names(fit$fitted.values))]<- predict(fit) # replace all data-cells with predicted values
pred <- predictors$NDVI # make a copy of an existing raster
values(pred) <-rval;rm(rval) #replace all values in this raster copy
names(pred) <- "Prediction"

# Residual Raster
rval <- getValues(s$Birds.richness) # Create new values
rval[as.numeric(names(fit$residuals))]<- fit$residuals # replace all data-cells with residual values
resid <-predictors$NDVI
values(resid) <-rval;rm(rval)
names(resid) <- "Residuals"</pre>

# Do the plot with spplot
ss <- stack(s$Birds.richness, pred, resid)
sp <- as(ss, 'SpatialGridDataFrame')

Multiple linear regression model output

Multiple linear regression model output

While looking at the residual plot you might notice that our simple model fails to explain all the variation at mountain altitudes (the Andes).  Still the predicted values look very alike the observed richness. Bird species Richness is highest at tropical mountain ranges, which is consistent with results from Africa (Jetz & Rahbek, 2002). Reasons for this pattern are not fully understood yet, but if i had to discuss this with a colleague i would probably bring up arguments like older evolutionary time, higher habitat heterogeneity and greater numbers of climatic niches at mountain ranges. At this point you would then test for spatial autocorrelation using Moran´s I, adjust your data to that and use more sophisticated methods like General Additive Models (GAMs) or Spatial Autoregressive Model  (SARs) and account for the spatial autocorrelation. See Rahbek et al. (2007) for the actual study.


  • Jetz, W., & Rahbek, C. (2002). Geographic range size and determinants of avian species richness. Science, 297(5586), 1548-1551.
  • Rahbek, C., Gotelli, N. J., Colwell, R. K., Entsminger, G. L., Rangel, T. F. L., & Graves, G. R. (2007). Predicting continental-scale patterns of bird species richness with spatially explicit models. Proceedings of the Royal Society B: Biological Sciences, 274(1607), 165-174.


Tags: , , , , , ,

About Martin Jung

PhD researcher at the University of Sussex. Interested in nature conservation, ecology and biodiversity as well as statistics, GIS and 'big data'
Sussex Research Hive

Supporting the research community at the University of Sussex

Small Pond Science

Research, teaching, and mentorship in the sciences

Landscape Ecology 2.0

intersecting landscape ecology, open science, and R


The Research Blog of IIASA

Jörg Steinkamps Blog

Mainly things about R, Linux and vegetation modeling

Amy Whitehead's Research

the ecological musings of a conservation biologist

Michael McCarthy's Research

School of BioSciences, The University of Melbourne

The Rostrum

science, statistics, policy and more


Environmental Change - Understand, Predict, Adapt

Dynamic Ecology

Multa novit vulpes


METeorological Visualisation Utilities using R for Science and Teaching

A Birder´s Blog

"Everybody loves what they know"


A new metric to quantify biodiversity response to fragmentation

Trust Me, I'm a Geographer

Using Technology to Explore Our World

Duncan Golicher's weblog

Research, scripts and life in Chiapas

%d bloggers like this: