Tag Archive | r-code

Playing with Landsat 8 metadata

The Landsat mission is one the most successful remote-sensing programs and has been running since the early 1970s. The most recent addition to the flock of Landsat satellites – Mission Nr. 8 has been supplying tons of images to researchers, NGO’s and governments for over two years now.  Providing nearly 400 images daily (!) this has amassed to an impressive dataset of over half a million individual images by now (N = 515243 by 29/07/2015).

Landsat 8 scenes can be easily queried via a number of web-interfaces, the oldest and most successful being the USGS earth-explorer which also distributes other NASA remote-sensing products.  ESA also started to mirror Landsat 8 data and so did the great Libra website from developmentseed.  Using the Landsat8 before/after tool from remotepixel.ca tool you can even make on the fly comparisons of imagery scenes. You might ask how some of those services are able to show you the number of images and the estimated cloud-cover. This information is saved in the scenes-list metadata file, which contains the identity, name, acquisition date and many other information from all Landsat 8 scenes since the start of the mission. In addition Landsat 8 also has a cloudCover estimate (and sadly only L8, but the USGS is working on a post-creation measure for the previous satellites as far as I know), which you can readily explore on a global scale. Here is some example code showcasing how to peek into this huge ever-growing archive.

# Download the metadata file
l = "http://landsat.usgs.gov/metadata_service/bulk_metadata_files/LANDSAT_8.csv.gz"
download.file(l,destfile = basename(l))
# Now decompress
t = decompressFile(basename(l),temporary = T,overwrite=T,remove=F,ext="gz",FUN=gzfile)

Now you can read in the resulting csv. For speed I would recommend using the “data.table” package!

# Load data.table
if(!require(data.table))install.packages("data.table");library(data.table)
# Use fread to read in the csv
system.time( zz = fread(t,header = T) )
file.remove(t)

The metadata file contains quite a number of cool fields to explore. For instance the “browseURL” columns contains the full link to an online .jpg thumbnail. Very useful to have a quick look at the scene.

require('jpeg')
l = "http://earthexplorer.usgs.gov/browse/landsat_8/2015/164/071/LC81640712015201LGN00.jpg"
download.file(l,basename(l))
jpg = readJPEG("LC81640712015201LGN00.jpg") # read the file
res = dim(jpg)[1:2] # get the resolution
plot(1,1,xlim=c(1,res[1]),ylim=c(1,res[2]),asp=1,type='n',xaxs='i',yaxs='i',xaxt='n',yaxt='n',xlab='',ylab='',bty='n')
rasterImage(jpg,1,1,res[1],res[2])
L8 Thumbnail

L8 Thumbnail

The “cloudCoverFull” column contains the average cloud-cover for each scene, which is interesting to explore as the long-term average of measured cloudCover per region/country likely differs due to different altitude or precipitation levels. Here is a map showing the average cloud-cover per individual scene since mission start:

    Average global cloud cover in Landsat 8 data

Average global cloud cover in Landsat 8 data

Clouds are a major source of annoyance for anyone who intends to measure vegetation cover or classify land-cover. Might write another post later showcasing some examples on how to filter satellite data for clouds.

Advertisements

Assessing habitat specialization using IUCN data

Since quite some time ecological models have tried to incorporate both continuous and discrete characteristics of species into their models. Newbold et al. (2013) demonstrated that functional traits affect the response of tropical bird species towards land-use intensity. Tropical forest specialist birds seem to decrease globally in probability of presence and abundance in more intensively used forests. This patterns extends to many taxonomic groups and the worldwide decline of “specialist species” has been noted before by Clavel et al. (2011).

From Newbold et al. 2013

(a) Probabilities of presence of tropical bird species in in different disturbed forests and (b) ratios of abundance in light and intensive disturbed forests relative to undisturbed forests. Forest specialists are disproportionally affected in intensively used forests. Figure from Newbold et al. 2013 doi: http://dx.doi.org/10.1098/rspb.2012.2131

But how to acquire such data on habitat specialization? Ether you assemble your own exhaustive trait database or you query information from some of the openly available data sources. One could for instance be the IUCN redlist, which not only has expert-validated data on a species current threat status, but also on population size and also on a species habitat preference. Here IUCN follows its own habitat classification scheme ( http://www.iucnredlist.org/technical-documents/classification-schemes/habitats-classification-scheme-ver3 ). The curious ecologist and conservationist should keep in mind however, that not all species are currently assessed by IUCN.

There are already a lot of scripts available on the net from which you can get inspiration on how to query the IUCN redlist (Kay Cichini from the biobucket explored this already in 2012 ). Even better: Someone actually compiled a whole r-package called letsR full of web-scraping functions to access the IUCN redlist. Here is some example code for Perrin’s Bushshrike, a tropical bird quite common in central Africa

# Install package
install.packages(letsR)
library(letsR)

# Perrin's or Four-colored Bushshrike latin name
name <- 'Telophorus viridis'

# Query IUCN status
lets.iucn(name)

#>Species        Family Status Criteria Population Description_Year
#>Telophorus viridis MALACONOTIDAE LC Stable 1817
#>Country
#>Angola, Congo, The Democratic Republic of the Congo, Gabon, Zambia

# Or you can query habitat information
lets.iucn.ha(name)

#>Species Forest Savanna Shrubland Grassland Wetlands Rocky areas Caves and Subterranean Habitats
#>Telophorus viridis      1       1         1         0        0           0                               0
#> Desert Marine Neritic Marine Oceanic Marine Deep Ocean Floor Marine Intertidal Marine Coastal/Supratidal
#>      0              0              0                       0                 0                         0
#>  Artificial/Terrestrial Artificial/Aquatic Introduced Vegetation Other Unknown
#>                      1                  0                     0     0       0

letsR also has other methods to work with the spatial data that IUCN provides ( http://www.iucnredlist.org/technical-documents/spatial-data ), so definitely take a look. It works by querying the IUCN redlist api for the species id (http://api.iucnredlist.org/go/Telophorus-viridis). Sadly the habitat function does only return the information if a species is known to occur in a given habitat, but not if it is of major importance for a particular species (so if for instance a Species is known to be a “forest-specialist” ). Telophorus viridis for instance also occurs in savannah and occasionally artificial habitats like gardens ( http://www.iucnredlist.org/details/classify/22707695/0 ).

So I just programmed my own function to assess if forest habitat is of major importance to a given species. It takes a IUCN species id as input and returns ether “Forest-specialist”, if forest habitat is of major importance to a species, “Forest-associated” if a species is just known to occur in forest or “Other Habitats” if a species does not occur in forests at all. The function works be cleverly querying the IUCN redlist and breaking up the HTML structure at given intervals that indicate a new habitat type.

Find the function on gist.github (Strangely WordPress doesn’t include them as they promised)

How does it work? You first enter the species IUCN redlist id. It is in the url after you have queried a given species name. Alternatively you could also download the whole IUCN classification table and match your species name against it 😉 Find it here. Then simply execute the function with the code.

name = 'Telophorus viridis'
data <- read.csv('all.csv')
# This returns the species id
data$Red.List.Species.ID[which(data$Scientific.Name==name)]
#> 22707695

# Then simply run my function
isForestSpecialist(22707695)
#> 'Forest-specialist'

Bulk downloading and analysing MODIS data in R

Today we are gonna work with bulk downloaded data from MODIS. The MODIS satellites Terra and Aqua have been floating around the earth since the year 2000 and provide a reliable and free-to-obtain source of remote sensing data for ecologists and conservationists. Among the many MODIS products, Indices of Vegetation Greenness such as NDVI or EVI have been used in countless ecological studies and are the first choice of most ecologists for relating field data to remote sensing data. Here we gonna demonstrate 3 different ways how to acquire and calculate the mean EVI for a given Region and time-period. For this we are using the MOD13Q1 product. We focus on the area a little bit south of Mount Kilimanjaro and the temporal span around May 2014.

(1)
The first handy R-package we gonna use is MODISTools. It is able to download spatial-temporal data via a simple subset command. The nice advantage of using *MODISTools* is that it only downloads the point of interest as the subsetting and processing is done server-wise. This makes it excellent to download only what you need and reduce the amount of download data. A disadvantage is that it queries a perl script on the daac.ornl.gov server, which is often horrible slow and stretched to full capacity almost every second day.

# Using the MODISTools Package
library(MODISTools)
# MODISTools requires you to make a query data.frame
coord <- c(-3.223774, 37.424605) # Coordinates south of Mount Kilimanjaro
product <- "MOD13Q1"
bands <- c("250m_16_days_EVI","250m_16_days_pixel_reliability") # What to query. You can get the names via GetBands
savedir <- "tmp/" # You can save the downloaded File in a specific folder
pixel <- c(0,0) # Get the central pixel only (0,0) or a quadratic tile around it
period <- data.frame(lat=coord[1],long=coord[2],start.date=2013,end.date=2014,id=1)
# To download the pixels
MODISSubsets(LoadDat = period,Products = product,Bands = bands,Size = pixel,SaveDir = "",StartDate = T)
MODISSummaries(LoadDat = period,FileSep = ",", Product = "MOD13Q1", Bands = "250m_16_days_EVI",ValidRange = c(-2000,10000), NoDataFill = -3000, ScaleFactor = 0.0001,StartDate = TRUE,Yield = T,Interpolate = T, QualityScreen = TRUE, QualityThreshold = 0,QualityBand = "250m_16_days_pixel_reliability")
# Finally read the output
read.table("MODIS_Summary_MOD13Q1_2014-08-10.csv",header = T,sep = ",")

The Mean EVI between the year 2013 up to today on the southern slope of Kilimanjaro was .403 (SD=.036). The Yield (integral of interpolated data between start and end date) is 0.05. MODISTools is a very much suitable for you if you want to get hundreds of individual point locations, but less suitable if you want to extract for instance values for a whole area (eg. a polygon shape) or are just annoyed by the frequent server breakdowns…

(2)
If you want to get whole MODIS tiles for your area of interest you have another option in R available. The MODIS package is particularly suited for this job in R and even has options to incorporate ArcGis as path for additional processing. It is also possible to use the MODIS Reprojection Tool or gdal (our choice) as underlying workhorse.

library(MODIS)
dates <- as.POSIXct( as.Date(c("01/05/2014","31/05/2014"),format = "%d/%m/%Y") )
dates2 <- transDate(dates[1],dates[2]) # Transform input dates from before
# The MODIS package allows you select tiles interactively.  We however select them manually here
h = "21"
v = "09"
runGdal(product=product,begin=dates2$beginDOY,end = dates2$endDOY,tileH = h,tileV = v,)
# Per Default the data will be stored in
# ~homefolder/MODIS_ARC/PROCESSED
# After download you can stack the processed TIFS
vi <- preStack(path="~/MODIS_ARC/PROCESSED/MOD13Q1.005_20140810192530/", pattern="*_EVI.tif$")
s <- stack(vi)
s <- s*0.0001 # Rescale the downloaded Files with the scaling factor

# And extract the mean value for our point from before.
# First Transform our coordinates from lat-long to to the MODIS sinus Projection
sp <- SpatialPoints(coords = cbind(coord[2],coord[1]),proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84"))
sp <- spTransform(sp,CRS(proj4string(s)))
extract(s,sp) # Extract the EVI values for the available two layers from the generated stack
#>  0.2432  | 0.3113

MODIS EVI Tile for East Africa

Whole MODIS EVI Tile (250m cs) for East Africa

 

(3)
If all the packages and tools so far did not work as expected, there is also an alternative to use a combination of R and Python to analyse your MODIS files or download your tiles manually. The awesome pyModis scripts allow you to download directly from a USGS server, which at least for me was almost always faster than the LP-DAAC connection the two other solutions before used. However up so far both the MODISTools and MODIS package have processed the original MODIS tiles (which ship in hdf4 container format) for you. Using this solution you have to access the hdf4 files yourself and extract the layers you are interested in.
Here is an example how to download a whole hdf4 container using the modis_download.py script Just query modis_download.py -h if you are unsure about the command line parameters.

# Custom command. You could build your own using a loop and paste
f <- paste0("modis_download.py -r -p MOD13Q1.005 -t ",tile," -e ","2014-05-01"," -f ","2014-05-30"," tmp/","h21v09")
# Call the python script
system(f)
# Then go into your defined download folder (tmp/ in our case)

# Now we are dealing with the original hdf4 files.
# The MODIS package in R has some processing options to get a hdf4 layer into a raster object using gdal as a backbone

library(MODIS)
lay <- "MOD13Q1.A2014145.h21v09.005.2014162035037.hdf" # One of the raw 16days MODIS tiles we downloaded
mod <- getSds(lay,method="gdal") # Get the layer names and gdal links
# Load the layers from the hdf file. Directly apply scaling factor
ndvi <- raster(mod$SDS4gdal[1]) * 0.0001
evi <- raster(mod$SDS4gdal[2]) * 0.0001
reliability <- raster(mod$SDS4gdal[12])
s <- stack(ndvi,evi,reliability)

#Now extract the coordinates values
extract(s,sp)

There you go. Everything presented here was executed on a Linux Debian machine and I have no clue if it works for you Windows or MAC users. Try it out. Hope everyone got some inspiration how to process MODIS data in R 😉

Macroecology playground (3) – Spatial autocorrelation

Hey, it has been over 2 months, so welcome in 2014 from my side. And i am sorry for not posting more updates recently, but like everyone i was (and still am) under constant working pressure. This year will be quite interesting for me personally as i am about to start my thesis project and will (besides other things) go to Africa for fieldwork. But for now i will try to catch your interest with a new macroecology playground post dealing with the important issue of spatial autocorrelation. See the other Macroecology playground posts here and here for knowing what happened in the past.


Spatial autocorrelation is the issue that data points in geographical space are somewhat dependent on each other or their values correlated because of spatial proximity/distance. This is also known as the first law of geography (Google it). However, most of the statistical tools we have available assume that all our datapoints are independent from each other, which is rarely the case in macroecology. Just imagine the steep slope of mountain regions. Literally all big values will always occur near the peak of the mountains and decrease with distance from the peak. There is thus already a data-inherent gradient present which we somehow have to account for, if we are to investigate the effect of altitude alone (and not the effect of the proximity to nearby cells).

In our hypothetical example we want to explore how well the topographical average (average height per grid cell) can explain amphibian richness in South America and if the residuals (model errors) in our model are spatially autocorrelated. I can’t share the data, but i believe the dear reader will get the idea of what we are trying to do.

# Load libraries
library(raster)

# Load in your dataset. In my case i am loading both the Topo and richness from a raster stack.
amp <- s$Amphibians.richness
topo <- s$Topographical.Average
summary(fit1 <- lm(getValues(amp)~getValues(topo)))
# Extract from the output
Multiple R-squared: 0.1248, Adjusted R-squared: 0.1242
F-statistic: 217.7 on 1 and 1527 DF, p-value: < 2.2e-16

par(mfrow=c(2,1))
plot(amp,col=rainbow(100,start=0.2))
plot(s$Topographical.Average)

What did we do? As you can see we fitted a simple linear regression model using the values from both the amphibian richness raster layer and the topographical range raster. The relation seems to be highly significant and this simple model can explain up to 12.4% of the variation. Here is the basic plot output for both response and predictor variable.

Plot of response and predictor values

Plot of response and predictor values

As you can see high values of both layers seem to be spatially clustered. So the likelihood of violating the independence of datapoints in a linear regression model is quite likely. Lets investigate the spatial autocorrelation by looking at Moran’s I, which is a measure for spatial autocorrelation (technically its just a determinant of correlation that calculated the pearsons r of surrounding values within a certain window). So lets investigate if the residual values (the error in model fit) are spatially autocorrelated.

library(ncf) # For the Correlogram

# Generate an Residual Raster from the model before
rval <- getValues(amp) # Create new raster
rval[as.numeric(names(fit1$residuals))]<- fit1$residuals # replace all data-cells with res value
resid <- topo
values(resid) <-rval;rm(rval) #replace our values in this new raster
names(resid) <- "Residuals"

# Now calculate Moran's I of the new residual raster layer
x = xFromCell(resid,1:ncell(resid)) # take x coordinates
y = yFromCell(resid,1:ncell(resid)) # take y coordinates
z = getValues(resid) # and the values of course
# Now calculate Moran's I
# Use the extracted coordinates and values, increase the distance in 100er steps and don't forget to use latlon=T (given that you have your rasters in WGS84 projected)
system.time(co <- correlog(x,y,z,increment = 100,resamp = 0, latlon = T,na.rm=T)) # this can take a while.
# It takes even longer if you try to estimate significance of spatial autocorrelation

# Now show the result
plot(0,type="n",col="black",ylab="Moran's I",xlab="lag distance",xlim=c(0,6500),ylim=c(-1,1))
abline(h=0,lty="dotted")
lines(co$correlation~co$mean.of.class,col="red",lwd=2)
points(x=co$x.intercept,y=0,pch=19,col="red")
Moran's I of the model residuals

Moran’s I of the model residuals

Ideally Moran’s I should be as close to zero as possible. In the above plot you can see that values in close distance (up to 2000 Distance units) and with greater distance as well, the model residuals are positively autocorrelated (too great than expected by chance alone, thus correlated with proximity). The function correlog allows you to resample the dataset to investigate significance of this patterns, but for now i will just assume that our models residuals are significantly spatially autocorrelated.

There are numerous techniques to deal with or investigate spatial autocorrelation. Here the interested reader is advised to look at Dormann et al. (2007) for inspiration. In our example we will try to fit a simultaneous spatial autoregressive model (SAR) and try to see if we can partially get the spatial autocorrelation out of the residual error.  SARs can model the spatial error generating process and operate with weight
matrices that specify the strength of interaction between neighbouring sites (Dormann et al., 2007). If you know that the spatial autocorrelation occurs in the response variable only, a so called “lagged-response model” would be most appropriate, otherwise use a “mixed” SAR if the error occurs in both response and predictors. However Kissling and Carl (2008) investigated SAR models in detail and came to the conclusion that lagged and mixed SARs might not always give better results than ordinary least square regressions and can generate bias (Kissling & Carl, 2008). Instead they recommend to calculate “spatial error” SAR models when dealing with species distribution data, which assumes that the spatial correlation does neither occur in response or predictors, but in the error term.

So lets build the spatial weights and fit a SAR:

library(spdep)

x = xFromCell(amp,1:ncell(amp))
y = yFromCell(amp,1:ncell(amp))
z = getValues(amp)
nei <- dnearneigh(cbind(x,y),d1=0,d2=2000,longlat=T) # Get neighbourlist of interactions with a distance unit 2000.
nlw <- nb2listw(nei,style="W",zero.policy=T)
# You should calculate the interaction weights with the maximal distance in which autocorrelation occurs.
# But here we will just take the first x-intercept where positive correlation turns into the negative.
# Now fit the spatial error SAR
sar_e <- errorsarlm(z~topo,data=val,listw=nlw,na.action=na.omit,zero.policy=T)
# We use the generated z values and weights as input. Nodata values are excluded and zeros are given to boundary errors

# Now compare how much Variation can be explained
summary(fit1)$adj.r.squared # The r_squared of the normal regression
> 0.124
summary(sar_e,Nagelkerke=T)$NK # Nagelkerkes pseudo r_square of the SAR
> 0.504 # -- for SAR. So we could increase the influence of topographical average value on amphibian richness

# Finally do a likelihood ratio test
LR.sarlm(sar_e,fit1)
# Likelihood ratio for spatial linear models
>data:
>Likelihood ratio = 869.7864, df = 1, p-value &lt; 2.2e-16
>sample estimates:
>Log likelihood of sar_e; Log likelihood of fit1
> -7090.903 
>-7525.796

# Not only are our two models significantly different, but the log likelihood of our SAR is also greater than the ordinary model
# indicating a better fit.

The SAR is one of many methods to deal with spatial autocorrelation. I agree that the choice of of the weights matrix distance is a bit arbitrary (it made sense for me), so you might want to investigate the occurence of spatial correlations a bit more prior to fitting a SAR. So have we dealt with the autocorrelation? Lets just calculate Moran’s I values again for both the old residual and the SAR residual values. Looks better doesn’t it?

Comparison of Moran's I for both a linear model and a error SAR residuals

Comparison of Moran’s I for both a linear model and a error SAR residuals

References:

  • F Dormann, C., M McPherson, J., B Araújo, M., Bivand, R., Bolliger, J., Carl, G., … & Wilson, R. (2007). Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography, 30(5), 609-628.
  • Kissling, W. D., & Carl, G. (2008). Spatial autocorrelation and the selection of simultaneous autoregressive models. Global Ecology and Biogeography, 17(1), 59-71.

Macroecology playground (1) – Bird species richness in a nutshell

Ahh, Macroecology. The study of ecological patterns and processes on big scales.  Questions like “what factors determine distribution and diversity of all life on earth?” have troubled scientists since A.v.Humboldt and Wallace times. At the University of Copenhagen a whole research center has been dedicated to this specific field and macro-ecological studies are more and more present in prestigious journals like Nature and Science.  Previous studies at the center have found skewed distributions of bird richness with a specific bias towards the mountains (Jetz & Rahbek, 2002, Rahbek et al., 2007). In this blog post i am going to play a bit around with some data from Rahbek et al. (2007). The analysis and the graphs are by no means sufficient (and even violate many model assumptions like homoscedasticity, normality and data independence) and are therefore more of exploratory nature 😉 The post will show you how to build a raster stack of geographical data and how to use the data in some very basic models.


It was recommended to me to use the freely available SAM software for the analysis but although the program is really nice and fast it isn’t suitable enough for me as you can not modify specific model parameters or graphical outputs. And as a self-declared R junkie i refuse to work with “click-compute-result” tools 😉

So here is how the head of SAM data file  (“data.sam”)  looks like (i won’t share it, so please generate your own data).

sc_samAs you can see the .sam file is technically just a tabulator separated table with the coordinates for a gridcell (1° gridcell on a latitude-longitude projection) and all response and predictor values for this cell. To get this data into R we are gonna use the raster package to generate a so called raster stack for our analysis. This is how i did it

# Load libraries
library(raster)
# Create Data from SAM
data <- read.delim(file="data.sam",header=T,sep="\t",dec=".") # read in a data.frame
coordinates(data) <- ~Longitude+Latitude # Convert to a SpatialPointsDataframe
cs <- "+proj=longlat +datum=WGS84 +no_defs" # define the correct projection (long-lat)
gridded(data) <- T # Make a SpatialPixelsDataframe
proj4string(data)  <- CRS(cs) # set the defined CRS

# Create Raster layer stack
s <- stack()
for(n in names(data)){
d <- data.frame(coordinates(data),data[,n])
ras <- rasterFromXYZ(xyz=d,digits=10,crs=CRS(cs))
s <- addLayer(s,ras)
rm(d,n,ras)
}
# Now you can query and plot the raster layers from the stack
plot(s$Birds.richness,col=rainbow(100,start=0.1))
South American Bird species Richness. Grain Size: 1°

South American Bird species Richness. Grain Size: 1°

You wanna do some modeling or extract data? Here you go. First we make a subset of some of our predictors from the raster stack and then fit ordinary least squares multiple regression models to our data to see how much variance can be explained. Note that linear regressions are not the proper techniques for this kind of analysis (degrees of freedom to high due to spatial autocorrelation, violation of assumptions mentioned before), but its still useful for explanatory purposes.

# Extract some predictors from the raster Stack
predictors <- subset(s,c(7,8,10))
names(predictors)
>  "NDVI" "Topographical.Range" "Annual.Mean.Temperature"
# Now extract the data from both the bird richness layer and the predictors
birds <- getValues(s$Birds.richness)
val <- as.data.frame(getValues(predictors))

# Do the multiple regression
fit <- lm(birds~.,data=val)
summary(fit)
>                          Estimate Std. Error t value Pr(>|t|)
(Intercept)             215.675282  15.837493   13.62   <2e-16 ***
NDVI                    -34.541242   1.245769  -27.73   <2e-16 ***
Topographical.Range       0.056458   0.002452   23.03   <2e-16 ***
Annual.Mean.Temperature   0.940664   0.054747   17.18   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 81.86 on 1525 degrees of freedom
(1461 observations deleted due to missingness)
Multiple R-squared:  0.6931,    Adjusted R-squared:  0.6925
F-statistic:  1148 on 3 and 1525 DF,  p-value: < 2.2e-16

Ignore the p-values and just focus on the adjusted r² value. As you can see we are able to explain nearly 70% of the variance with this simple model. So how do our residuals and the predicted values look like? For that we have to create analogous raster layers containing both the predicted and the residual values. Then we plot all species raster layers again using the spplot function from the package sp (automatically loaded with “raster”)

# Estimates prediction
rval <- getValues(s$Birds.richness) # Create new values
rval[as.numeric(names(fit$fitted.values))]<- predict(fit) # replace all data-cells with predicted values
pred <- predictors$NDVI # make a copy of an existing raster
values(pred) <-rval;rm(rval) #replace all values in this raster copy
names(pred) <- "Prediction"

# Residual Raster
rval <- getValues(s$Birds.richness) # Create new values
rval[as.numeric(names(fit$residuals))]<- fit$residuals # replace all data-cells with residual values
resid <-predictors$NDVI
values(resid) <-rval;rm(rval)
names(resid) <- "Residuals"</pre>

# Do the plot with spplot
ss <- stack(s$Birds.richness, pred, resid)
sp <- as(ss, 'SpatialGridDataFrame')
trellis.par.set(sp.theme())
spplot(sp)

Multiple linear regression model output

Multiple linear regression model output

While looking at the residual plot you might notice that our simple model fails to explain all the variation at mountain altitudes (the Andes).  Still the predicted values look very alike the observed richness. Bird species Richness is highest at tropical mountain ranges, which is consistent with results from Africa (Jetz & Rahbek, 2002). Reasons for this pattern are not fully understood yet, but if i had to discuss this with a colleague i would probably bring up arguments like older evolutionary time, higher habitat heterogeneity and greater numbers of climatic niches at mountain ranges. At this point you would then test for spatial autocorrelation using Moran´s I, adjust your data to that and use more sophisticated methods like General Additive Models (GAMs) or Spatial Autoregressive Model  (SARs) and account for the spatial autocorrelation. See Rahbek et al. (2007) for the actual study.

References:

  • Jetz, W., & Rahbek, C. (2002). Geographic range size and determinants of avian species richness. Science, 297(5586), 1548-1551.
  • Rahbek, C., Gotelli, N. J., Colwell, R. K., Entsminger, G. L., Rangel, T. F. L., & Graves, G. R. (2007). Predicting continental-scale patterns of bird species richness with spatially explicit models. Proceedings of the Royal Society B: Biological Sciences, 274(1607), 165-174.

Port your R scripts to QGIS using SEXTANTE

has grown a lot in the past years and is increasingly used as GIS. For many spatial operations and even spatial statistics certain R packages are simply the best choice up to date. For instance Home-range analysis are kinda impossible to perform (at least for me) without looking at the adehabitat packages. You want to perform a spatial operation in a script and do not know how to code python: Just use R. Francisco Rodriguez-Sanchez has posted very nice examples how to perform GIS tasks completely within R.

However R as GIS lacks the simplicity in matters of design and it requires quite an effort to make a good-looking map. Therefore using QGIS for map-design and output is always the favored choice for me. But why use R and QGIS independently? Through the SEXTANTE Toolbox for QGIS it is possible to execute fast and precise r-scripts for your spatial data just in QGIS.

What you need to do:

  • Download the current QGIS dev. and R.
  • Enable SEXTANTE in QGIS and activate the R-scripts in the SEXTANTE options provider window. Furthermore specify a R-scripts folder where you store your scripts.
  • Also make sure that your scripts are logged (it is in the SEXTANTE options as well)
  • Execute one of the Example R-scripts to test if the scripts are working.

If the above steps all turned out as expected you could start formatting your own r-scripts into a so-called .rsx file (R SEXTANTE script).

Here is a little info-graphic how to use R in a SEXTANTE context:

RSextanteCommands

So open your favorite text-editor (or the new r-script dialog in QGIS) and create a new file. Save it into the rscripts folder inside your QGIS profile directory (“~/.qgis2/sextante/rscripts” on Linux based systems. Similar structure under Windows inside of your Documents folder). All .rsx scripts start with defining a group (where to save the script in SEXTANTE) and continue with additional inputs specified by the user. All Input data comes with two hashs in front. Furthermore you need to specify if the script output should show plots (add “#showplots” at the beginning of the script) and/or console output (start a command with “>”).

After you wrote your script, startup QGIS, open the SEXTANTE toolbox and have fun executing it. All things are possible, but it isn’t really easy to debug .rsx scripts in QGIS as the output is limited and sometimes you just wonder why it isn’t working.

To get you started here is the basic script to do the nice-looking levelplot from the rasterVis r-package:

##[Own Scripts] = group
##layer = raster
##showplots
library(rasterVis)
myPal <- terrain.colors(20)
levelplot(layer,col.regions=myPal,contour=T)

Script is stored in the “Own Scripts” group. It just requires a raster (best is a DEM) as input.
You could extend the scripts by saving the output to a user defined folder or by creating just a plot for a specific extent (for instance the current QGIS extent). Output looks like this for the country of Skane in south Sweden:

output_LevelplotRight now this is just for show, but of course you could also generate for example contours of the raster DEM and save them in GDAL supported format after script execution.

Distance between two latitude-longitude points in R

Hey,

i just came around this function on the net and just want to share it. It is an R-function to roughly calculate the euclidean distance in kilometers between two points given in longitude and latitude values. It might be not too precise due to the inaccurate estimate of the earth radius (R).

# Calculate distance in kilometers between two points
earth.dist <- function (long1, lat1, long2, lat2)
{
rad <- pi/180
a1 <- lat1 * rad
a2 <- long1 * rad
b1 <- lat2 * rad
b2 <- long2 * rad
dlon <- b2 - a2
dlat <- b1 - a1
a <- (sin(dlat/2))^2 + cos(a1) * cos(b1) * (sin(dlon/2))^2
c <- 2 * atan2(sqrt(a), sqrt(1 - a))
R <- 6378.145
d <- R * c
return(d)
}
Sussex Research Hive

Supporting the research community at the University of Sussex

Small Pond Science

Research, teaching, and mentorship in the sciences

Landscape Ecology 2.0

intersecting landscape ecology, open science, and R

nexus

The Research Blog of IIASA

Jörg Steinkamps Blog

Mainly things about R, Linux and vegetation modeling

Amy Whitehead's Research

the ecological musings of a conservation biologist

Michael McCarthy's Research

School of BioSciences, The University of Melbourne

The Rostrum

science, statistics, policy and more

marionpfeifer

Environmental Change - Understand, Predict, Adapt

Dynamic Ecology

Multa novit vulpes

metvurst

METeorological Visualisation Utilities using R for Science and Teaching

A Birder´s Blog

"Everybody loves what they know"

Trust Me, I'm a Geographer

Using Technology to Explore Our World

Duncan Golicher's weblog

Research, scripts and life in Chiapas