Tag Archive | model

New Paper: Local factors mediate the response of biodiversity to land use on two African mountains

I know that it has been a while since I posted anything here. The daily responsibilities and effort required for my PhD program are taking quite a toll on the time I have available for other non-phd matters (for instance curating this blog). I apologize for this and hope to post some more tutorials and discussion post in the future. However at the moment my personal research reserved 105% of my available time.  But the scientific blogosphere is generally in a bit of a crisis I heard.

Anyway, today I just want to quickly share the exciting news that my MSc thesis I conducted at the Center for Macroecology, Evolution and Climate has passed scientific peer review and is now in early view in Animal Conservation. I am quite proud of this work as it represents the first lead-author paper I managed to publish that involved primary research and data collection.

Short breakdown: During my masters and also now in my PhD I am extensively working with the PREDICTS database, which is a global project aiming at collating local biodiversity estimates in different land-use systems across the entire world. The idea for this work came as I realized that many of the categories in the PREDICTS database are affected by some level of subjectivity. Local factors – such as specific land-use forms, vegetation conditions and species assemblage composition – could alter general responses of biodiversity to land use that have been generalized across larger scales. Thus the simple idea was to compare ‘PREDICTS-style’ model predictions with independent biodiversity estimates raised at the same local scale. But see abstract and paper below.


Jung et al (2016) – Local factors mediate the response of biodiversity to land use on two African mountains



Land-use change is the single biggest driver of biodiversity loss in the tropics. Biodiversity models can be useful tools to inform policymakers and conservationists of the likely response of species to anthropogenic pressures, including land-use change. However, such models generalize biodiversity responses across wide areas and many taxa, potentially missing important characteristics of particular sites or clades. Comparisons of biodiversity models with independently collected field data can help us understand the local factors that mediate broad-scale responses. We collected independent bird occurrence and abundance data along two elevational transects in Mount Kilimanjaro, Tanzania and the Taita Hills, Kenya. We estimated the local response to land use and compared our estimates with modelled local responses based on a large database of many different taxa across Africa. To identify the local factors mediating responses to land use, we compared environmental and species assemblage information between sites in the independent and African-wide datasets. Bird species richness and abundance responses to land use in the independent data followed similar trends as suggested by the African-wide biodiversity model, however the land-use classification was too coarse to capture fully the variability introduced by local agricultural management practices. A comparison of assemblage characteristics showed that the sites on Kilimanjaro and the Taita Hills had higher proportions of forest specialists in croplands compared to the Africa-wide average. Local human population density, forest cover and vegetation greenness also differed significantly between the independent and Africa-wide datasets. Biodiversity models including those variables performed better, particularly in croplands, but still could not accurately predict the magnitude of local species responses to most land uses, probably because local features of the land management are still missed. Overall, our study demonstrates that local factors mediate biodiversity responses to land use and cautions against applying biodiversity models to local contexts without prior knowledge of which factors are locally relevant.


The PREDICTS project – We need your data!

As part of my Thesis project I have recently joined up with the researchers and interns of the PREDICTS project. PREDICTS stands for Projecting Responses of Ecological Diversity In Changing Terrestrial Systems (yeah, fancy and down-to-the-point acronym) and is aiming to investigate the impact of various human pressures on biological diversity on a global scale. PREDICTS gets its data from contributing authors and is constantly looking for new data contributors. All contributors will become coauthors of a paper describing the database and  at the end of the project the whole database will be released to the public!

We want your data! For instance Wildcat (Felis sylvestris) encounters over land-use gradients!

We want your data! For instance Wildcat (Felis sylvestris) encounters in different habitat types!

If you have diversity or community composition data collected from more than one terrestrial site which are somehow influenced by humanity and are raised using a standardized methodology, then it is more than likely that we could use it. What we need is the

  • Locations of sampling points, as precisely as possible
    (with the coordinate system used, if possible)
  • An indication of the type of land cover that each sampling point represents
    (e.g. primary forest, secondary forest, intensively-farmed crop, hedgerow)
  • An indication of how intensively the site is used by people
  • Data on the presence / absence, or ideally a measure of abundance, of each species at each site
  • The date(s) that each measurement was taken

We need more openness in terms of data sharing in conservation and ecology research! It is unbelievable that some important research data even today can go lost if for instance the original author died or his lab burned to the ground. Some might argue that it should be mandatory to share data if your research is 100% funded by public sources. Some understandable reasons that speak against data sharing after publication are for instance that you are a young emerging scientist and want to keep your hardly earned golden eggs to yourself. However this can be debated as well as data sharing not only gives you more citations, but maybe even into contact with other researchers in your field. In other cases researchers sometimes don’t want to share raw sampling data because of conservation concerns, but even here there are options to coarsen coordinates before public release.

Recent initiatives on openness in terms of ecological data sharing like https://datadryad.org/ and http://figshare.com/ already provide a splendid place where you as an Author can dump raw data from papers you wrote years ago. You can even place the raw data from your most current projects and put an embargo on the download so that the item will be released to the public for instance one year after the associated article has been published.


For my thesis I am especially looking for all kinds of African community data that has been published. We already have a lot of studies in the database, but for my project i need more data especially of less sampled taxa (insects, amphibians,…), different temporal resolutions and a greater diversity of land-use types.  So especially if you have data on African species communities in any form (diversity metrics, abundance metrics, I even take occurence matrices) which were sampled in somehow anthropogenic disturbed habitats: Please contact me or wait for me to contact you 🙂

Martin Jung

Macroecology playground (1) – Bird species richness in a nutshell

Ahh, Macroecology. The study of ecological patterns and processes on big scales.  Questions like “what factors determine distribution and diversity of all life on earth?” have troubled scientists since A.v.Humboldt and Wallace times. At the University of Copenhagen a whole research center has been dedicated to this specific field and macro-ecological studies are more and more present in prestigious journals like Nature and Science.  Previous studies at the center have found skewed distributions of bird richness with a specific bias towards the mountains (Jetz & Rahbek, 2002, Rahbek et al., 2007). In this blog post i am going to play a bit around with some data from Rahbek et al. (2007). The analysis and the graphs are by no means sufficient (and even violate many model assumptions like homoscedasticity, normality and data independence) and are therefore more of exploratory nature 😉 The post will show you how to build a raster stack of geographical data and how to use the data in some very basic models.

It was recommended to me to use the freely available SAM software for the analysis but although the program is really nice and fast it isn’t suitable enough for me as you can not modify specific model parameters or graphical outputs. And as a self-declared R junkie i refuse to work with “click-compute-result” tools 😉

So here is how the head of SAM data file  (“data.sam”)  looks like (i won’t share it, so please generate your own data).

sc_samAs you can see the .sam file is technically just a tabulator separated table with the coordinates for a gridcell (1° gridcell on a latitude-longitude projection) and all response and predictor values for this cell. To get this data into R we are gonna use the raster package to generate a so called raster stack for our analysis. This is how i did it

# Load libraries
# Create Data from SAM
data <- read.delim(file="data.sam",header=T,sep="\t",dec=".") # read in a data.frame
coordinates(data) <- ~Longitude+Latitude # Convert to a SpatialPointsDataframe
cs <- "+proj=longlat +datum=WGS84 +no_defs" # define the correct projection (long-lat)
gridded(data) <- T # Make a SpatialPixelsDataframe
proj4string(data)  <- CRS(cs) # set the defined CRS

# Create Raster layer stack
s <- stack()
for(n in names(data)){
d <- data.frame(coordinates(data),data[,n])
ras <- rasterFromXYZ(xyz=d,digits=10,crs=CRS(cs))
s <- addLayer(s,ras)
# Now you can query and plot the raster layers from the stack
South American Bird species Richness. Grain Size: 1°

South American Bird species Richness. Grain Size: 1°

You wanna do some modeling or extract data? Here you go. First we make a subset of some of our predictors from the raster stack and then fit ordinary least squares multiple regression models to our data to see how much variance can be explained. Note that linear regressions are not the proper techniques for this kind of analysis (degrees of freedom to high due to spatial autocorrelation, violation of assumptions mentioned before), but its still useful for explanatory purposes.

# Extract some predictors from the raster Stack
predictors <- subset(s,c(7,8,10))
>  "NDVI" "Topographical.Range" "Annual.Mean.Temperature"
# Now extract the data from both the bird richness layer and the predictors
birds <- getValues(s$Birds.richness)
val <- as.data.frame(getValues(predictors))

# Do the multiple regression
fit <- lm(birds~.,data=val)
>                          Estimate Std. Error t value Pr(>|t|)
(Intercept)             215.675282  15.837493   13.62   <2e-16 ***
NDVI                    -34.541242   1.245769  -27.73   <2e-16 ***
Topographical.Range       0.056458   0.002452   23.03   <2e-16 ***
Annual.Mean.Temperature   0.940664   0.054747   17.18   <2e-16 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 81.86 on 1525 degrees of freedom
(1461 observations deleted due to missingness)
Multiple R-squared:  0.6931,    Adjusted R-squared:  0.6925
F-statistic:  1148 on 3 and 1525 DF,  p-value: < 2.2e-16

Ignore the p-values and just focus on the adjusted r² value. As you can see we are able to explain nearly 70% of the variance with this simple model. So how do our residuals and the predicted values look like? For that we have to create analogous raster layers containing both the predicted and the residual values. Then we plot all species raster layers again using the spplot function from the package sp (automatically loaded with “raster”)

# Estimates prediction
rval <- getValues(s$Birds.richness) # Create new values
rval[as.numeric(names(fit$fitted.values))]<- predict(fit) # replace all data-cells with predicted values
pred <- predictors$NDVI # make a copy of an existing raster
values(pred) <-rval;rm(rval) #replace all values in this raster copy
names(pred) <- "Prediction"

# Residual Raster
rval <- getValues(s$Birds.richness) # Create new values
rval[as.numeric(names(fit$residuals))]<- fit$residuals # replace all data-cells with residual values
resid <-predictors$NDVI
values(resid) <-rval;rm(rval)
names(resid) <- "Residuals"</pre>

# Do the plot with spplot
ss <- stack(s$Birds.richness, pred, resid)
sp <- as(ss, 'SpatialGridDataFrame')

Multiple linear regression model output

Multiple linear regression model output

While looking at the residual plot you might notice that our simple model fails to explain all the variation at mountain altitudes (the Andes).  Still the predicted values look very alike the observed richness. Bird species Richness is highest at tropical mountain ranges, which is consistent with results from Africa (Jetz & Rahbek, 2002). Reasons for this pattern are not fully understood yet, but if i had to discuss this with a colleague i would probably bring up arguments like older evolutionary time, higher habitat heterogeneity and greater numbers of climatic niches at mountain ranges. At this point you would then test for spatial autocorrelation using Moran´s I, adjust your data to that and use more sophisticated methods like General Additive Models (GAMs) or Spatial Autoregressive Model  (SARs) and account for the spatial autocorrelation. See Rahbek et al. (2007) for the actual study.


  • Jetz, W., & Rahbek, C. (2002). Geographic range size and determinants of avian species richness. Science, 297(5586), 1548-1551.
  • Rahbek, C., Gotelli, N. J., Colwell, R. K., Entsminger, G. L., Rangel, T. F. L., & Graves, G. R. (2007). Predicting continental-scale patterns of bird species richness with spatially explicit models. Proceedings of the Royal Society B: Biological Sciences, 274(1607), 165-174.

Sussex Research Hive

Supporting the research community at the University of Sussex

Small Pond Science

Research, teaching, and mentorship in the sciences

Landscape Ecology 2.0

intersecting landscape ecology, open science, and R


The Research Blog of IIASA

Jörg Steinkamps Blog

Mainly things about R, Linux and vegetation modeling

Amy Whitehead's Research

the ecological musings of a conservation biologist

Michael McCarthy's Research

School of BioSciences, The University of Melbourne

The Rostrum

science, statistics, policy and more


Environmental Change - Understand, Predict, Adapt

Dynamic Ecology

Multa novit vulpes


METeorological Visualisation Utilities using R for Science and Teaching

A Birder´s Blog

"Everybody loves what they know"


A new metric to quantify biodiversity response to fragmentation

Trust Me, I'm a Geographer

Using Technology to Explore Our World

Duncan Golicher's weblog

Research, scripts and life in Chiapas