# BIOFRAG – Biodiversity responses to Forest Fragmentation

Another interesting project closely related to PREDICTS is the BIOFRAG Project, which tries to construct a global database of research papers dealing with Forest Fragmentation and its impacts on Biodiversity taxa. One final goal of the BIOFRAG project is the development of a new fragmentation index using watersheds delineation algorithm and fragment descriptors in order to characterize Fragment traits. I am very interested in seeing the final outcome of this approach and maybe I even find the time to implement their algorithm in LecoS for QGIS as soon as it is released. Their database paper, lead authored by Marion Pfeifer, was just released to the public as open-access paper. You can read it in full here.

If you consider of contributing data then more information can be found on the BIOFRAG blog and all researchers involved with forest fragmentation research should consider contributing to them and also to PREDICTS (see here) if you haven’t already done so. And as usual: If you were studying in Africa, then please get in touch with me! I will contact you as soon as I return from my Fieldwork in Kenya and Tanzania at the end of May.

# Out in the field – Working in the agricultural Mosaic of the Taita Hills

And here are some news from my current field work that is part of my Thesis. After spending some quiet, but exiting days in Nairobi (maybe later more about that) I finally arrived in Wundanyi, Taita Hills, where a substantial part of my work will be conducted along the CHIESA transect. Suited in the coastel area in proximity to Mombasa the Taita Hills are renown for their extraordinary bird diversity and endemic species and as such are considered to be part of the Eastern Arc Mountains Diversity hotspot. The Taita hills encompass a variety of different land-use forms, but the majority of them surely are tropical homegardens as most of the “Taita” people are subsistence farmers growing crops in the highly fertile soil of the mountain slopes. Besides homegardens there are riverine forests in the valleys, shrubland vegetation in the lower altitudes, exotic tree plantations and of course the remaining indigenous forests remaining on the Taita hills mountain tops. Every last forest part is known well and was traditionally protected by the locals as part of their culture. However in the later centuries the remaining forest area became more and more scarcer and even during my visits in some of the forest fragments with the highest biodiversity value (Chawia, Ngangao) I saw frequent signs of fuelwood and timber extraction. Clearly a lack of funding for biodiversity protection seems to be the problem, but also an economic perspective and opportunities such as ecotourism might enhance locals perception if and how these last forest parts should be protected.

My work in the Taita hills is all about birds. Specifically I am conducting avian diversity and abundance assessments along an altitudinal transect encompassing a variety of different land-use systems. Although avian assessments have been conducted in Taita many times before, they were often restricted to the forest fragments and for instance didn’t look at the bird diversity in homegardens in different altitudes. The resulting data will just be used for my thesis as validation dataset, but I am hoping that it has maybe some value on its own as well. Initial results show that especially the homegarden in Taita support quite a high diversity of birds, which is even similar to levels in the remaining forest fragments (although the community is somewhat different and biotic homogenization is likely on-going).

It can be quite challenging to conduct avian research in tropical human-dominated landscapes. Not only do you have to arrange for transport to the specific transect areas and lodging (in my case provided by the University of Helsinki Research station in Wundanyi), but also account for the frequent interruption by children and farmers asking what you are doing. Furthermore it is not an easy task to count birds in for instance a maize or sugarcane plantation due to the limited accessibility and my intention not to damage the farmers crops. Most of the farmers however happily provide access to their land and are very interested in what kind of research this “Mzungu” is doing on their farm. From my own experience here I can tell that the Taita people are very kind and it is a pleasure to work with them on their land. They are very respectful and even walking around late at night or very early in the morning seems to be no problem here (in contrast to for instance Nairobi or Mombasa).

In the end my sampling goes on quite well and much better than I expected. Although it is technically raining season and long heavy rains can be expected every day, the mornings were exceptionally dry and weather was mostly favourable for ornithological research. Generally this time of the year in East Africa is especially interesting for bird assessments as many local bird species are in their breeding plumage and nesting, but also because European migrants are often still around or on their way back to Europe (for instance I saw and heard an European Willow Warbler some days ago). Lets see what else the next weeks will have for be in terms of avian diversity.

# The PREDICTS project – We need your data!

As part of my Thesis project I have recently joined up with the researchers and interns of the PREDICTS project. PREDICTS stands for **P**rojecting **R**esponses of **E**cological **D**iversity **I**n **C**hanging **T**errestrial **S**ystems (yeah, fancy and down-to-the-point acronym) and is aiming to investigate the impact of various human pressures on biological diversity on a global scale. PREDICTS gets its data from contributing authors and is constantly looking for new data contributors. All contributors will become coauthors of a paper describing the database and at the end of the project the whole database will be released to the public!

If you have diversity or community composition data collected from more than one terrestrial site which are somehow influenced by humanity and are raised using a standardized methodology, then it is more than likely that we could use it. What we need is the

- Locations of sampling points, as precisely as possible

(with the coordinate system used, if possible) - An indication of the type of land cover that each sampling point represents

(e.g. primary forest, secondary forest, intensively-farmed crop, hedgerow) - An indication of how intensively the site is used by people
- Data on the presence / absence, or ideally a measure of abundance, of each species at each site
- The date(s) that each measurement was taken

We need more openness in terms of data sharing in conservation and ecology research! It is unbelievable that some important research data even today can go lost if for instance the original author died or his lab burned to the ground. Some might argue that it should be mandatory to share data if your research is 100% funded by public sources. Some understandable reasons that speak against data sharing after publication are for instance that you are a young emerging scientist and want to keep your hardly earned golden eggs to yourself. However this can be debated as well as data sharing not only gives you more citations, but maybe even into contact with other researchers in your field. In other cases researchers sometimes don’t want to share raw sampling data because of conservation concerns, but even here there are options to coarsen coordinates before public release.

Recent initiatives on openness in terms of ecological data sharing like https://datadryad.org/ and http://figshare.com/ already provide a splendid place where you as an Author can dump raw data from papers you wrote years ago. You can even place the raw data from your most current projects and put an embargo on the download so that the item will be released to the public for instance one year after the associated article has been published.

Anyway:

For my thesis I am especially looking for all kinds of African community data that has been published. We already have a lot of studies in the database, but for my project i need more data especially of less sampled taxa (insects, amphibians,…), different temporal resolutions and a greater diversity of land-use types. So especially if you have data on African species communities in any form (diversity metrics, abundance metrics, I even take occurence matrices) which were sampled in somehow anthropogenic disturbed habitats: **Please contact me or wait for me to contact you** :)

Martin Jung

# Statistical inferences using p-values

And another quick post for today. Here is a nice infographic I just found on the Nature News page. Nice demonstration how p-values can fail us in making hypothesis inferences. Just another article bashing p-values you could say. Or “Just switch already to Bayesian stats or report real effect strengths instead of p-values”. Although the matter is clear for many ecologists out there, the majority still happily uses p-values inferring that they proved their working hypothesis wrong or true. At my former and also at my current university p-values are still being taught and used in all courses related to data analysis. Students are being asked and expected to always (!) report the p-value and trained to look specifically for something they claim is statistical significance of an effect. And then people are wondering why the hell everyone still uses century old techniques. Often while not even knowing what it exactly means. I certainly believe (and I say that while being still educated :) ) that especially in the education of future ecologists and conservationists statistics courses should become mandatory for all (under)graduates. In times of big data analysis basic statistical knowledge has to be a must for everyone.

The related Nature News article can be found here. More nice infos and facts about my research in Africa and fieldwork trip will appear around May.

EDIT: And as a funny addition check out this awesome R-function which gives you an appropriate significance description for every p-value :D

# Global Forest Change data now available for Download

The previously reported Global Forest data from Hansen et al. (2013) is now finally up for download.

Access the data here, but beware of the size of the individual granules as they easily be some gigabytes. The time for some awesome analysis and probably a bunch of papers has come…

# Google Maps routing for QGIS

Just a little post with something not totally related with Ecology, but nevertheless quite useful for the daily work with GIS. Some time ago i got hand on a nice Garmin GPS device (Etrex 30) and i am constantly playing around with the options and opportunities for it. Especially in the interaction framework of fieldwork and desktop-based GIS processing. Now i was at the lucky situation to have some time available to go birding in the area of Vestamager southwest of Copenhagen, Denmark. My intention was to bicycle there and use the GPS for orientation (although i perfectly know the route :) ).

Thus i wrote this simple little R processing script (see a general introduction how to create R-scripts for processing here), which uses the **route(…)** function of the **ggmap** package to generate line layers from a point x to point y. Note, that by using this function you are agreeing to the Google Maps API Terms of Service and you are only allowed to send 2500 queries per day.

The generated output line is automatically loaded into QGIS after processing and has the total length and duration of the trip in its attribute-table.

To use the script, create a new one in the processing toolbox and copy the contents below into it. Then copy it into your “~/.qgis2/processing/rscripts” folder. I will also post the script in the QGIS scripts section here on this blog.

# GGmap routing script by Martin Jung # Homepage: http://conservationecology.wordpress.com/ ##Vector processing=group ##x = string Copenhagen, Denmark ##y = string Berlin, Germany ##type= string driving ##output = output vector if (!require(ggmap)){print("ggmap not installed. Will install it now");install.packages(ggmap, dependencies = TRUE)} library(rgdal) r <- route(from=x,to=y,mode=type,structure="route",output="simple",alternatives=F,messaging=F) # get the route cs <- CRS("+proj=longlat +datum=WGS84 +no_defs") # WGS84 projection l <- Lines(list(Line(r[c("lon","lat")])),ID=paste0(type,"_track")) sl <- SpatialLines(list(l),cs) data <- data.frame(from=x,to=y,type=type,length_km=sum(r$km,na.rm=T),duration_h=sum(r$hours,na.rm=T)) output <- SpatialLinesDataFrame(sl,data=data,match.ID=F)

To use existing shapefiles from within QGIS, users are advised to take a look at the pgrouting extension for Postgis. See a nice tutorial for installation and configuration on windows machines here.

# Macroecology playground (3) – Spatial autocorrelation

Hey, it has been over 2 months, so welcome in 2014 from my side. And i am sorry for not posting more updates recently, but like everyone i was (and still am) under constant working pressure. This year will be quite interesting for me personally as i am about to start my thesis project and will (besides other things) go to Africa for fieldwork. But for now i will try to catch your interest with a new macroecology playground post dealing with the important issue of spatial autocorrelation. See the other Macroecology playground posts here and here for knowing what happened in the past.

Spatial autocorrelation is the issue that data points in geographical space are somewhat dependent on each other or their values correlated because of spatial proximity/distance. Most of the statistical tools we have available for analysis out there assume that all our datapoints are independent from each other, which is rarely the case in macroecology. Just imagine the steep slope of mountain regions. Literally all big values will always occur near the peak of the mountains and decrease with distance from the peak. There is thus already a data inherent gradient present which we somehow have to account for, if are to investigate the effect of altitude alone (and not the effect of the proximity to nearby cells).

In our hypothetical example we want to explore how well the topographical average (average height per grid cell) can explain amphibian richness in South America and if the residuals (model errors) in our model are spatially autocorrelated. I can’t share the data, but i believe the dear reader will get the idea of what we are trying to do.

# Load libraries library(raster) # Load in your dataset. In my case i am loading both the Topo and richness from a raster stack. amp <- s$Amphibians.richness topo <- s$Topographical.Average summary(fit1 <- lm(getValues(amp)~getValues(topo))) # Extract from the output > Multiple R-squared: 0.1248, Adjusted R-squared: 0.1242 > F-statistic: 217.7 on 1 and 1527 DF, p-value: < 2.2e-16 par(mfrow=c(2,1)) plot(amp,col=rainbow(100,start=0.2)) plot(s$Topographical.Average)

What did we do? As you can see we fitted a simple linear regression model using the values from both the amphibian richness raster layer and the topographical range raster. The relation seems to be highly significant and this simple model can explain up to 12.4% of the variation. Here is the basic plot output for both response and predictor variable.

As you can see high values of both layers seem to be spatially clustered. So the likelihood of violating the independence of datapoints in the linear regression model is very high. Lets investigate the spatial autocorrelation by looking at Moran’s I, which is a measure for spatial autocorrelation (technically its just a determinant of correlation like pearsons r combined with spatial weights of the surroundings). So lets investigate if the residual values (the error in model fit) are spatially autocorrelated.

library(ncf) # For the Correlogram # Generate an Residual Raster from the model before rval <- getValues(amp) # Create new raster rval[as.numeric(names(fit1$residuals))]<- fit1$residuals # replace all data-cells with res value resid <- topo values(resid) <-rval;rm(rval) #replace our values in this new raster names(resid) <- "Residuals" # Now calculate Moran's I of the new residual raster layer x = xFromCell(resid,1:ncell(resid)) # take x coordinates y = yFromCell(resid,1:ncell(resid)) # take y coordinates z = getValues(resid) # and the values of course # Now calculate Moran's I # Use the extracted coordinates and values, increase the distance in 100er steps and don't forget to use latlon=T (given that you have your rasters in WGS84 projected) system.time(co <- correlog(x,y,z,increment = 100,resamp = 0, latlon = T,na.rm=T)) # this can take a while. # It takes even longer if you try to estimate significance of spatial autocorrelation # Now show the result plot(0,type="n",col="black",ylab="Moran's I",xlab="lag distance",xlim=c(0,6500),ylim=c(-1,1)) abline(h=0,lty="dotted") lines(co$correlation~co$mean.of.class,col="red",lwd=2) points(x=co$x.intercept,y=0,pch=19,col="red")

Ideally Moran’s I should be as close to zero as possible. In the above plot you can see that values in close distance (up to 2000 Distance units) and with greater distance as well, the model residuals are positively autocorrelated (too great than expected by values alone, thus correlated with proximity). The function **correlog** allows you to resample the dataset to investigate significance of this patterns, but for now i will just assume that our models residuals are significantly spatially autocorrelated.

There are numerous techniques to deal with or investigate spatial autocorrelation. Here the interested reader is advised to look at Dormann et al. (2007) for inspiration. In our example we will try to fit a simultaneous spatial autoregressive model (SAR) and try to see if we can partially get the spatial autocorrelation out of the residual error. SARs can model the spatial error generating process and operate with weight

matrices that specify the strength of interaction between neighbouring sites (Dormann et al., 2007). If you know that the spatial autocorrelation occurs in the response variable only, a so called “lagged-response model” would be most appropriate, otherwise use a “mixed” SAR if the error occurs in both response and predictors. However Kissling and Carl (2008) investigated SAR models in detail and came to the conclusion that lagged and mixed SARs might not always give better results than ordinary least square regressions and can generate bias (Kissling & Carl, 2008). Instead they recommend to calculate “spatial error” SAR models when dealing with species distribution data, which assumes that the spatial correlation does neither occur in response or predictors, but in the error term.

So lets build the spatial weights and fit a SAR:

library(spdep) x = xFromCell(amp,1:ncell(amp)) y = yFromCell(amp,1:ncell(amp)) z = getValues(amp) nei <- dnearneigh(cbind(x,y),d1=0,d2=2000,longlat=T) # Get neighbourlist of interactions with a distance unit 2000. nlw <- nb2listw(nei,style="W",zero.policy=T) # You should calculate the interaction weights with the maximal distance in which autocorrelation occurs. # But here we will just take the first x-intercept where positive correlation turns into the negative. # Now fit the spatial error SAR sar_e <- errorsarlm(z~topo,data=val,listw=nlw,na.action=na.omit,zero.policy=T) # We use the generated z values and weights as input. Nodata values are excluded and zeros are given to boundary errors # Now compare how much Variation can be explained summary(fit1)$adj.r.squared # The r_squared of the normal regression > 0.124 summary(sar_e,Nagelkerke=T)$NK # Nagelkerkes pseudo r_square of the SAR > 0.504 # <-- for SAR. So we could increase the influence of topographical average value on amphibian richness # Finally do a likelihood ratio test LR.sarlm(sar_e,fit1) > Likelihood ratio for spatial linear models > >data: >Likelihood ratio = 869.7864, df = 1, p-value < 2.2e-16 >sample estimates: >Log likelihood of sar_e Log likelihood of fit1 > -7090.903 -7525.796 # Not only are our two models significantly different, but the log likelihood of our SAR is also greater than the ordinary model # indicating a better fit.

The SAR is one of many methods to deal with spatial autocorrelation. I agree that the choice of of the weights matrix distance is a bit arbitrary (it made sense for me), so you might want to investigate the occurence of spatial correlations a bit more prior to fitting a SAR. So have we dealt with the autocorrelation? Lets just calculate Moran’s I values again for both the old residual and the SAR residual values. Looks better doesn’t it?

References:

- F Dormann, C., M McPherson, J., B Araújo, M., Bivand, R., Bolliger, J., Carl, G., … & Wilson, R. (2007). Methods to account for spatial autocorrelation in the analysis of species distributional data: a review.
*Ecography*,*30*(5), 609-628. -
Kissling, W. D., & Carl, G. (2008). Spatial autocorrelation and the selection of simultaneous autoregressive models.
*Global Ecology and Biogeography*,*17*(1), 59-71.