Tag Archive | maps in r

Playing with Landsat 8 metadata

The Landsat mission is one the most successful remote-sensing programs and has been running since the early 1970s. The most recent addition to the flock of Landsat satellites – Mission Nr. 8 has been supplying tons of images to researchers, NGO’s and governments for over two years now.  Providing nearly 400 images daily (!) this has amassed to an impressive dataset of over half a million individual images by now (N = 515243 by 29/07/2015).

Landsat 8 scenes can be easily queried via a number of web-interfaces, the oldest and most successful being the USGS earth-explorer which also distributes other NASA remote-sensing products.  ESA also started to mirror Landsat 8 data and so did the great Libra website from developmentseed.  Using the Landsat8 before/after tool from remotepixel.ca tool you can even make on the fly comparisons of imagery scenes. You might ask how some of those services are able to show you the number of images and the estimated cloud-cover. This information is saved in the scenes-list metadata file, which contains the identity, name, acquisition date and many other information from all Landsat 8 scenes since the start of the mission. In addition Landsat 8 also has a cloudCover estimate (and sadly only L8, but the USGS is working on a post-creation measure for the previous satellites as far as I know), which you can readily explore on a global scale. Here is some example code showcasing how to peek into this huge ever-growing archive.

# Download the metadata file
l = "http://landsat.usgs.gov/metadata_service/bulk_metadata_files/LANDSAT_8.csv.gz"
download.file(l,destfile = basename(l))
# Now decompress
t = decompressFile(basename(l),temporary = T,overwrite=T,remove=F,ext="gz",FUN=gzfile)

Now you can read in the resulting csv. For speed I would recommend using the “data.table” package!

# Load data.table
# Use fread to read in the csv
system.time( zz = fread(t,header = T) )

The metadata file contains quite a number of cool fields to explore. For instance the “browseURL” columns contains the full link to an online .jpg thumbnail. Very useful to have a quick look at the scene.

l = "http://earthexplorer.usgs.gov/browse/landsat_8/2015/164/071/LC81640712015201LGN00.jpg"
jpg = readJPEG("LC81640712015201LGN00.jpg") # read the file
res = dim(jpg)[1:2] # get the resolution
L8 Thumbnail

L8 Thumbnail

The “cloudCoverFull” column contains the average cloud-cover for each scene, which is interesting to explore as the long-term average of measured cloudCover per region/country likely differs due to different altitude or precipitation levels. Here is a map showing the average cloud-cover per individual scene since mission start:

    Average global cloud cover in Landsat 8 data

Average global cloud cover in Landsat 8 data

Clouds are a major source of annoyance for anyone who intends to measure vegetation cover or classify land-cover. Might write another post later showcasing some examples on how to filter satellite data for clouds.

Macroecology playground (3) – Spatial autocorrelation

Hey, it has been over 2 months, so welcome in 2014 from my side. And i am sorry for not posting more updates recently, but like everyone i was (and still am) under constant working pressure. This year will be quite interesting for me personally as i am about to start my thesis project and will (besides other things) go to Africa for fieldwork. But for now i will try to catch your interest with a new macroecology playground post dealing with the important issue of spatial autocorrelation. See the other Macroecology playground posts here and here for knowing what happened in the past.

Spatial autocorrelation is the issue that data points in geographical space are somewhat dependent on each other or their values correlated because of spatial proximity/distance. Most of the statistical tools we have available for analysis out there assume that all our datapoints are independent from each other, which is rarely the case in macroecology. Just imagine the steep slope of mountain regions. Literally all big values will always occur near the peak of the mountains and decrease with distance from the peak. There is thus already a data inherent gradient present which we somehow have to account for, if are to investigate the effect of altitude alone (and not the effect of the proximity to nearby cells).

In our hypothetical example we want to explore how well the topographical average (average height per grid cell) can explain amphibian richness in South America and if the residuals (model errors) in our model are spatially autocorrelated. I can’t share the data, but i believe the dear reader will get the idea of what we are trying to do.

# Load libraries

# Load in your dataset. In my case i am loading both the Topo and richness from a raster stack.
amp <- s$Amphibians.richness
topo <- s$Topographical.Average
summary(fit1 <- lm(getValues(amp)~getValues(topo)))
# Extract from the output
> Multiple R-squared:  0.1248,    Adjusted R-squared:  0.1242
> F-statistic: 217.7 on 1 and 1527 DF,  p-value: < 2.2e-16


What did we do? As you can see we fitted a simple linear regression model using the values from both the amphibian richness raster layer and the topographical range raster. The relation seems to be highly significant and this simple model can explain up to 12.4% of the variation. Here is the basic plot output for both response and predictor variable.

Plot of response and predictor values

Plot of response and predictor values

As you can see high values of both layers seem to be spatially clustered. So the likelihood of violating the independence of datapoints in the linear regression model is very high. Lets investigate the spatial autocorrelation by looking at Moran’s I, which is a measure for spatial autocorrelation (technically its just a determinant of correlation like pearsons r combined with spatial weights of the surroundings). So lets investigate if the residual values (the error in model fit) are spatially autocorrelated.

library(ncf) # For the Correlogram

# Generate an Residual Raster from the model before
rval <- getValues(amp) # Create new raster
rval[as.numeric(names(fit1$residuals))]<- fit1$residuals # replace all data-cells with res value
resid <- topo
values(resid) <-rval;rm(rval) #replace our values in this new raster
names(resid) <- "Residuals"

# Now calculate Moran's I of the new residual raster layer
x = xFromCell(resid,1:ncell(resid)) # take x coordinates
y = yFromCell(resid,1:ncell(resid)) # take y coordinates
z = getValues(resid) # and the values of course
# Now calculate Moran's I
# Use the extracted coordinates and values, increase the distance in 100er steps and don't forget to use latlon=T (given that you have your rasters in WGS84 projected)
system.time(co <- correlog(x,y,z,increment = 100,resamp = 0, latlon = T,na.rm=T)) # this can take a while.
# It takes even longer if you try to estimate significance of spatial autocorrelation

# Now show the result
plot(0,type="n",col="black",ylab="Moran's I",xlab="lag distance",xlim=c(0,6500),ylim=c(-1,1))
Moran's I of the model residuals

Moran’s I of the model residuals

Ideally Moran’s I should be as close to zero as possible. In the above plot you can see that values in close distance (up to 2000 Distance units) and with greater distance as well, the model residuals are positively autocorrelated (too great than expected by chance alone, thus correlated with proximity). The function correlog allows you to resample the dataset to investigate significance of this patterns, but for now i will just assume that our models residuals are significantly spatially autocorrelated.

There are numerous techniques to deal with or investigate spatial autocorrelation. Here the interested reader is advised to look at Dormann et al. (2007) for inspiration. In our example we will try to fit a simultaneous spatial autoregressive model (SAR) and try to see if we can partially get the spatial autocorrelation out of the residual error.  SARs can model the spatial error generating process and operate with weight
matrices that specify the strength of interaction between neighbouring sites (Dormann et al., 2007). If you know that the spatial autocorrelation occurs in the response variable only, a so called “lagged-response model” would be most appropriate, otherwise use a “mixed” SAR if the error occurs in both response and predictors. However Kissling and Carl (2008) investigated SAR models in detail and came to the conclusion that lagged and mixed SARs might not always give better results than ordinary least square regressions and can generate bias (Kissling & Carl, 2008). Instead they recommend to calculate “spatial error” SAR models when dealing with species distribution data, which assumes that the spatial correlation does neither occur in response or predictors, but in the error term.

So lets build the spatial weights and fit a SAR:


x = xFromCell(amp,1:ncell(amp))
y = yFromCell(amp,1:ncell(amp))
z = getValues(amp)
nei <- dnearneigh(cbind(x,y),d1=0,d2=2000,longlat=T) # Get neighbourlist of interactions with a distance unit 2000.
nlw <- nb2listw(nei,style="W",zero.policy=T)
# You should calculate the interaction weights with the maximal distance in which autocorrelation occurs.
# But here we will just take the first x-intercept where positive correlation turns into the negative.
# Now fit the spatial error SAR
sar_e <- errorsarlm(z~topo,data=val,listw=nlw,na.action=na.omit,zero.policy=T)
# We use the generated z values and weights as input. Nodata values are excluded and zeros are given to boundary errors

# Now compare how much Variation can be explained
summary(fit1)$adj.r.squared # The r_squared of the normal regression
> 0.124
summary(sar_e,Nagelkerke=T)$NK # Nagelkerkes pseudo r_square of the SAR
> 0.504 # <-- for SAR. So we could increase the influence of topographical average value on amphibian richness

# Finally do a likelihood ratio test
>    Likelihood ratio for spatial linear models
>Likelihood ratio = 869.7864, df = 1, p-value < 2.2e-16
>sample estimates:
>Log likelihood of sar_e  Log likelihood of fit1
>              -7090.903               -7525.796

# Not only are our two models significantly different, but the log likelihood of our SAR is also greater than the ordinary model
# indicating a better fit.

The SAR is one of many methods to deal with spatial autocorrelation. I agree that the choice of of the weights matrix distance is a bit arbitrary (it made sense for me), so you might want to investigate the occurence of spatial correlations a bit more prior to fitting a SAR. So have we dealt with the autocorrelation? Lets just calculate Moran’s I values again for both the old residual and the SAR residual values. Looks better doesn’t it?

Comparison of Moran's I for both a linear model and a error SAR residuals

Comparison of Moran’s I for both a linear model and a error SAR residuals


  • F Dormann, C., M McPherson, J., B Araújo, M., Bivand, R., Bolliger, J., Carl, G., … & Wilson, R. (2007). Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography, 30(5), 609-628.
  • Kissling, W. D., & Carl, G. (2008). Spatial autocorrelation and the selection of simultaneous autoregressive models. Global Ecology and Biogeography, 17(1), 59-71.

Port your R scripts to QGIS using SEXTANTE

has grown a lot in the past years and is increasingly used as GIS. For many spatial operations and even spatial statistics certain R packages are simply the best choice up to date. For instance Home-range analysis are kinda impossible to perform (at least for me) without looking at the adehabitat packages. You want to perform a spatial operation in a script and do not know how to code python: Just use R. Francisco Rodriguez-Sanchez has posted very nice examples how to perform GIS tasks completely within R.

However R as GIS lacks the simplicity in matters of design and it requires quite an effort to make a good-looking map. Therefore using QGIS for map-design and output is always the favored choice for me. But why use R and QGIS independently? Through the SEXTANTE Toolbox for QGIS it is possible to execute fast and precise r-scripts for your spatial data just in QGIS.

What you need to do:

  • Download the current QGIS dev. and R.
  • Enable SEXTANTE in QGIS and activate the R-scripts in the SEXTANTE options provider window. Furthermore specify a R-scripts folder where you store your scripts.
  • Also make sure that your scripts are logged (it is in the SEXTANTE options as well)
  • Execute one of the Example R-scripts to test if the scripts are working.

If the above steps all turned out as expected you could start formatting your own r-scripts into a so-called .rsx file (R SEXTANTE script).

Here is a little info-graphic how to use R in a SEXTANTE context:


So open your favorite text-editor (or the new r-script dialog in QGIS) and create a new file. Save it into the rscripts folder inside your QGIS profile directory (“~/.qgis2/sextante/rscripts” on Linux based systems. Similar structure under Windows inside of your Documents folder). All .rsx scripts start with defining a group (where to save the script in SEXTANTE) and continue with additional inputs specified by the user. All Input data comes with two hashs in front. Furthermore you need to specify if the script output should show plots (add “#showplots” at the beginning of the script) and/or console output (start a command with “>”).

After you wrote your script, startup QGIS, open the SEXTANTE toolbox and have fun executing it. All things are possible, but it isn’t really easy to debug .rsx scripts in QGIS as the output is limited and sometimes you just wonder why it isn’t working.

To get you started here is the basic script to do the nice-looking levelplot from the rasterVis r-package:

##[Own Scripts] = group
##layer = raster
myPal <- terrain.colors(20)

Script is stored in the “Own Scripts” group. It just requires a raster (best is a DEM) as input.
You could extend the scripts by saving the output to a user defined folder or by creating just a plot for a specific extent (for instance the current QGIS extent). Output looks like this for the country of Skane in south Sweden:

output_LevelplotRight now this is just for show, but of course you could also generate for example contours of the raster DEM and save them in GDAL supported format after script execution.

Distribution maps in R

Today i’m gonna play a little bit with map features and show you how to make different basic distribution maps in R. Using the 2.14.1 Version of R i will make a graphical distribution map of the dragonflies species in Bavaria. The data was extracted from the book “Libellen in Bayern” and applied to a presence-absence matrix. The whole of bavaria was converted to a geographical grid (X,Y-Values), whose values came from available topographical maps. Iam open to any suggestions or other packages which could present such data in a fashionable way!

For basic great looking maps you could use the package sp and lattice. Also i suggest using the package RColorBrewer, which provides very nice color ranges. See the comments in the R-Code for explantations.

library(sp);library(lattice)   # Loads all libraries
data <- read.csv2("grid_bayern.csv",
header=T,dec=",",sep=";",na.strings="NA") # Load the data
### The date has the following columns: "X","Y","Diversity"
coordinates(data) <- c("X","Y") # Apply X-Y Values as coordinates to form a SpatialPointsDataFrame.
## coloured points plot with legend in plotting area and scales
spplot(data, "Diversity",
cuts = 3, col.regions=brewer.pal(3, "Set1")[3:1],
legendEntries = c("small","avarage","high"))

## Blubble Plot --> Increasing bubble size for higher values
bubble(data, "Diversity", maxsize = 1.5,pch=19,
main = "Bavaria Dragonfly diversity", key.entries = c(1,5,10,25,50),scales=list(draw=F))

Diversity for bavarian dragonfliesBubbleplot for bavarian dragonlies

As some points seem to be missing you could also build an interpolated graphic. For this first we will need the packages maps, akima and fields. The code below loads in the dataset and defines our X and Y-Axis ranges and interpolates all data to adjacent areas based on contour-lines. Please note that these distribution is just a default kriging, which doesn’t have to be right. You need to look to some variograms and adjust your map to build the correct interpolated values.

library(akima);library(fields) ## Load all libraries
data <- read.csv2("grid_bavaria.csv",header=T,
dec=",",sep=";",na.strings="NA") ## load the data
rx=range(data$X);ry=range(data$Y) ## define the ranges of the plots

int.scp <- interp(data$X,data$Y), data$diversity, duplicate="strip") ## Make an interpolation

# Build the image plot with the interpolated values
col=brewer.pal(10, "Spectral")[10:1],nlevel=10,main="Spatial Diversity")
#contour(int.scp,add=TRUE) # You could also show the contour lines with this command

Interpolated dragonfly diversity for bavaria

Sussex Research Hive

Supporting the research community at the University of Sussex

Small Pond Science

Research, teaching, and mentorship in the sciences

Landscape Ecology 2.0

intersecting landscape ecology, open science, and R


The Research Blog of IIASA

Jörg Steinkamps Blog

Mainly things about R, Linux and vegetation modeling

Amy Whitehead's Research

the ecological musings of a conservation biologist

Michael McCarthy's Research

School of BioSciences, The University of Melbourne

The Rostrum

science, statistics, policy and more


Environmental Change - Understand, Predict, Adapt

Dynamic Ecology

Multa novit vulpes


METeorological Visualisation Utilities using R for Science and Teaching

A Birder´s Blog

"Everybody loves what they know"


A new metric to quantify biodiversity response to fragmentation

Trust Me, I'm a Geographer

Using Technology to Explore Our World

Duncan Golicher's weblog

Research, scripts and life in Chiapas