Tag Archive | GIS

Playing with Landsat 8 metadata

The Landsat mission is one the most successful remote-sensing programs and has been running since the early 1970s. The most recent addition to the flock of Landsat satellites – Mission Nr. 8 has been supplying tons of images to researchers, NGO’s and governments for over two years now.  Providing nearly 400 images daily (!) this has amassed to an impressive dataset of over half a million individual images by now (N = 515243 by 29/07/2015).

Landsat 8 scenes can be easily queried via a number of web-interfaces, the oldest and most successful being the USGS earth-explorer which also distributes other NASA remote-sensing products.  ESA also started to mirror Landsat 8 data and so did the great Libra website from developmentseed.  Using the Landsat8 before/after tool from remotepixel.ca tool you can even make on the fly comparisons of imagery scenes. You might ask how some of those services are able to show you the number of images and the estimated cloud-cover. This information is saved in the scenes-list metadata file, which contains the identity, name, acquisition date and many other information from all Landsat 8 scenes since the start of the mission. In addition Landsat 8 also has a cloudCover estimate (and sadly only L8, but the USGS is working on a post-creation measure for the previous satellites as far as I know), which you can readily explore on a global scale. Here is some example code showcasing how to peek into this huge ever-growing archive.

# Download the metadata file
l = "http://landsat.usgs.gov/metadata_service/bulk_metadata_files/LANDSAT_8.csv.gz"
download.file(l,destfile = basename(l))
# Now decompress
t = decompressFile(basename(l),temporary = T,overwrite=T,remove=F,ext="gz",FUN=gzfile)

Now you can read in the resulting csv. For speed I would recommend using the “data.table” package!

# Load data.table
# Use fread to read in the csv
system.time( zz = fread(t,header = T) )

The metadata file contains quite a number of cool fields to explore. For instance the “browseURL” columns contains the full link to an online .jpg thumbnail. Very useful to have a quick look at the scene.

l = "http://earthexplorer.usgs.gov/browse/landsat_8/2015/164/071/LC81640712015201LGN00.jpg"
jpg = readJPEG("LC81640712015201LGN00.jpg") # read the file
res = dim(jpg)[1:2] # get the resolution
L8 Thumbnail

L8 Thumbnail

The “cloudCoverFull” column contains the average cloud-cover for each scene, which is interesting to explore as the long-term average of measured cloudCover per region/country likely differs due to different altitude or precipitation levels. Here is a map showing the average cloud-cover per individual scene since mission start:

    Average global cloud cover in Landsat 8 data

Average global cloud cover in Landsat 8 data

Clouds are a major source of annoyance for anyone who intends to measure vegetation cover or classify land-cover. Might write another post later showcasing some examples on how to filter satellite data for clouds.

Neutral Landscape model generation with QGIS

There are many interesting things to calculate in relation to landscape ecology and its statistical metrics. However many (if not the majority) of the published toolsets are not reproducible, their algorithm code not published or open-source. Obviously this makes the easy implementation of underlying algorithms even harder for independent developers (scientists) if you don’t have the time to reproduce their work (not to mention the danger of making stupid mistakes, we are all human).

I recently found this new article in Methods in Ecology and Evolution by Etherington et al., who didn’t really present any novel techniques or methods, but instead provided a new python library that is capable of calculating Neutral Landscape Models (NLMs). NLMs are often used as nullmodel counterpart to real remote-sensing derived maps (land-cover or altitude) to test the effect of landscape structure or heterogeneity on a species (-community). Many NLM algorithms are based on cluster techniques, cellular automata or calculating randomly distributed numbers in a given 2d space. There have been critical and considerate voices stating that existing NLMS are often misused and better null-models are needed for specific hypothesis, such as a species perception of landscape structures. Nevertheless NLMs are still actively used and new papers published with it.


Figure by Etherington et al. showing a number of different NLMs


The new library, called NLMpy, is open source and published under the MIT licence. Thus I can easily use and integrate into QGIS and its processing framework. Their NLMpy library only depends on numpy and scipy and thus doesn’t add any other dependency to your python setup, if you already are able to run LecoS in your QGIS installation. The NLM functions are visible in the new LecoS 1.9.6 version, but only if you have NLMpy installed and it is available in your python path. Otherwise they won’t show up! Please don’t ask me here how to install additional python libraries on your machine, but rather consult google or some of the Q&A sites. I installed it following the instructions on this page.

Midpoint displacement algorithm in QGIS

Midpoint displacement algorithm in QGIS

NLM with randomly clustered landcover patches

NLM with randomly clustered landcover patches

After you have installed it and upgraded your LecoS version within QGIS, you should be able to spot a new processing group and a number of new algorithms. Here are some screenshots that show the new algorithms and two NLMs that I calculated. The first one is based on a Midpoint displacement algorithm and could be for instance used to test against an altitude raster layer (need to reclassify to real altitude values first). The second one is aimed at emulating a random classified land-cover map. Here I first calculated a proportional NLM using a random cluster nearest-neighbour algorithm. Second I used the libraries reclassify function (“Classify proportional Raster”) to convert the proportional values (range 0-1) into a relative number of landcover classes with exactly 6 different land-cover classes. Both null model look rather realistic, don’t they 😉

This is a quick and dirty implementation, so there could occur some errors. You should use a meter-based projection as extent (such as UTM) as negative values (common in degree-based projections like WGS84 latitude-longitude) sometimes result in strange error messages. You also have to change the CRS of the generated result to the one of your project manually, otherwise you likely won’t see the result. Instead of the number of rows and columns as in the original implementation, the functions in LecoS are based on a selected extent and the desired output cellsize.

For more complex modelling tasks I would suggest that you use the library directly. To give you a good start Etherington et al. also appended some example code and data in their article´s supporting information. Furthermore a few days ago they even had a spatial MEE blog post with some youtube video demonstrations how to use their library. So it shouldn’t be that hard even for python beginners. Or you could just use the processing interface within LecoS.

In any way, if you use the library in your research, I guess the authors would really appreciate it if you could cite them 🙂


In addition I also temporarily removed LecoS ability to calculate the mean patch distance metric due to some unknown errors in the calculation. I’m kinda stuck here and anyone who can spot the (maybe obvious) bug gets a virtual hug from me!


Happy new year!

Bulk downloading and analysing MODIS data in R

Today we are gonna work with bulk downloaded data from MODIS. The MODIS satellites Terra and Aqua have been floating around the earth since the year 2000 and provide a reliable and free-to-obtain source of remote sensing data for ecologists and conservationists. Among the many MODIS products, Indices of Vegetation Greenness such as NDVI or EVI have been used in countless ecological studies and are the first choice of most ecologists for relating field data to remote sensing data. Here we gonna demonstrate 3 different ways how to acquire and calculate the mean EVI for a given Region and time-period. For this we are using the MOD13Q1 product. We focus on the area a little bit south of Mount Kilimanjaro and the temporal span around May 2014.

The first handy R-package we gonna use is MODISTools. It is able to download spatial-temporal data via a simple subset command. The nice advantage of using *MODISTools* is that it only downloads the point of interest as the subsetting and processing is done server-wise. This makes it excellent to download only what you need and reduce the amount of download data. A disadvantage is that it queries a perl script on the daac.ornl.gov server, which is often horrible slow and stretched to full capacity almost every second day.

# Using the MODISTools Package
# MODISTools requires you to make a query data.frame
coord <- c(-3.223774, 37.424605) # Coordinates south of Mount Kilimanjaro
product <- "MOD13Q1"
bands <- c("250m_16_days_EVI","250m_16_days_pixel_reliability") # What to query. You can get the names via GetBands
savedir <- "tmp/" # You can save the downloaded File in a specific folder
pixel <- c(0,0) # Get the central pixel only (0,0) or a quadratic tile around it
period <- data.frame(lat=coord[1],long=coord[2],start.date=2013,end.date=2014,id=1)
# To download the pixels
MODISSubsets(LoadDat = period,Products = product,Bands = bands,Size = pixel,SaveDir = "",StartDate = T)
MODISSummaries(LoadDat = period,FileSep = ",", Product = "MOD13Q1", Bands = "250m_16_days_EVI",ValidRange = c(-2000,10000), NoDataFill = -3000, ScaleFactor = 0.0001,StartDate = TRUE,Yield = T,Interpolate = T, QualityScreen = TRUE, QualityThreshold = 0,QualityBand = "250m_16_days_pixel_reliability")
# Finally read the output
read.table("MODIS_Summary_MOD13Q1_2014-08-10.csv",header = T,sep = ",")

The Mean EVI between the year 2013 up to today on the southern slope of Kilimanjaro was .403 (SD=.036). The Yield (integral of interpolated data between start and end date) is 0.05. MODISTools is a very much suitable for you if you want to get hundreds of individual point locations, but less suitable if you want to extract for instance values for a whole area (eg. a polygon shape) or are just annoyed by the frequent server breakdowns…

If you want to get whole MODIS tiles for your area of interest you have another option in R available. The MODIS package is particularly suited for this job in R and even has options to incorporate ArcGis as path for additional processing. It is also possible to use the MODIS Reprojection Tool or gdal (our choice) as underlying workhorse.

dates <- as.POSIXct( as.Date(c("01/05/2014","31/05/2014"),format = "%d/%m/%Y") )
dates2 <- transDate(dates[1],dates[2]) # Transform input dates from before
# The MODIS package allows you select tiles interactively.  We however select them manually here
h = "21"
v = "09"
runGdal(product=product,begin=dates2$beginDOY,end = dates2$endDOY,tileH = h,tileV = v,)
# Per Default the data will be stored in
# After download you can stack the processed TIFS
vi <- preStack(path="~/MODIS_ARC/PROCESSED/MOD13Q1.005_20140810192530/", pattern="*_EVI.tif$")
s <- stack(vi)
s <- s*0.0001 # Rescale the downloaded Files with the scaling factor

# And extract the mean value for our point from before.
# First Transform our coordinates from lat-long to to the MODIS sinus Projection
sp <- SpatialPoints(coords = cbind(coord[2],coord[1]),proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84"))
sp <- spTransform(sp,CRS(proj4string(s)))
extract(s,sp) # Extract the EVI values for the available two layers from the generated stack
#>  0.2432  | 0.3113

MODIS EVI Tile for East Africa

Whole MODIS EVI Tile (250m cs) for East Africa


If all the packages and tools so far did not work as expected, there is also an alternative to use a combination of R and Python to analyse your MODIS files or download your tiles manually. The awesome pyModis scripts allow you to download directly from a USGS server, which at least for me was almost always faster than the LP-DAAC connection the two other solutions before used. However up so far both the MODISTools and MODIS package have processed the original MODIS tiles (which ship in hdf4 container format) for you. Using this solution you have to access the hdf4 files yourself and extract the layers you are interested in.
Here is an example how to download a whole hdf4 container using the modis_download.py script Just query modis_download.py -h if you are unsure about the command line parameters.

# Custom command. You could build your own using a loop and paste
f <- paste0("modis_download.py -r -p MOD13Q1.005 -t ",tile," -e ","2014-05-01"," -f ","2014-05-30"," tmp/","h21v09")
# Call the python script
# Then go into your defined download folder (tmp/ in our case)

# Now we are dealing with the original hdf4 files.
# The MODIS package in R has some processing options to get a hdf4 layer into a raster object using gdal as a backbone

lay <- "MOD13Q1.A2014145.h21v09.005.2014162035037.hdf" # One of the raw 16days MODIS tiles we downloaded
mod <- getSds(lay,method="gdal") # Get the layer names and gdal links
# Load the layers from the hdf file. Directly apply scaling factor
ndvi <- raster(mod$SDS4gdal[1]) * 0.0001
evi <- raster(mod$SDS4gdal[2]) * 0.0001
reliability <- raster(mod$SDS4gdal[12])
s <- stack(ndvi,evi,reliability)

#Now extract the coordinates values

There you go. Everything presented here was executed on a Linux Debian machine and I have no clue if it works for you Windows or MAC users. Try it out. Hope everyone got some inspiration how to process MODIS data in R 😉

Annual Forest Loss – A comparison between global forest loss datasets

The start of this year was marked by the publication of two new global datasets for environmental analysis. My impression is that both of those datasets will be of increasing importance in ecological analysis in the future (even though their value for conservation biology has been actively criticized, see Tropek et al. 2014). Thus there is a need to assess the accuracy of their forest loss detection over time and if they are consistent.

The first dataset is the already famous Global Forest Map published by Hansen et al. (2013) in Science end of last year. The temporal span of their dataset goes back from the year 2000 up to the year 2012 and by using only Landsat data in a temporal time-series analysis they got a pretty decent high-resolution land-cover product. Although the resolution of the Hansen dataset is great (30m global average coming from Landsat) Hansen et al. decided to only publish the year 2000 baseline with the forest cover. They provide us with aggregated loss, gain and loss per year layers though, but nevertheless the user has no option to reproduce a similar product for the year 2012.

The other dataset is the combined published result from a 4 year long monitoring by the japanese satellite ALOS-PALSAR. They decided to release a global forest cover map at a 50m spatial resolution, which in contrast to Hansen can be acquired for the whole time-frame of the ALOS-PALSAR mission. It thus has a temporal coverage of the whole globe from the year 2007 until 2010. The data can be acquired on their homepage after getting an account. The ALOS PALSAR data has a nice temporal span and can be downloaded for multiple years, thus in theory allowing to make temporal comparisons and predictions about future land-use trends. However I am a bit concerned about the accuracy of their classifications as I have found multiple errors already in the area I am working in.

Classification Errors with the ALOS PALSAR dataset. Suddenly there are huge waterbodies in the Savanna near Kilimanjaro

Classification Errors with the ALOS PALSAR dataset. Suddenly there are huge waterbodies (blue) in the Savanna near Kilimanjaro

Because I am interested in using the ALOS PALSAR dataset in my analysis (how often do you get a nice spatial-temporal dataset of forest cover) I made a comparison between the forest loss detected in my area of interest for both datasets. It should be noted that is a comparison between different satellite sensors as well and not only by classification algorithms. So we are not comparing products from the same data source.

So what is the plan for our comparison:

  • We downloaded the whole ALOS PALSAR layers for all years covered of the area around Kilimanjaro in northern Tanzania (N00, E035). We then extracted only the forest cover (Value == 1) and calculate the difference between years to acquire the forest loss for the year 2008,2009 and 2010 respectively.
  • From the Google Engine app we downloaded the “loss per year” dataset and cropped it to our area of interest. Furthermore we are only interested in the aggregated Forest loss in the years 2008, 2009 and 2010 which we have available in the ALOS PALSAR dataset. We furthermore resampled the Hansen dataset up to 50m to match up with the ALOS PALSAR resolution.

The Result:

I haven’t found a fancy way to display this simple comparison, so here comes just the result table. As predicted (if you look at it visually),the ALOS PALSAR algorithm overshots the amount of forest loss a lot.

year 2007-2008 2008-2009 2009-2010
Hansen Forest Loss cells 262 304 529
ALOS PALSAR Forest Loss cells 26995 24970 16297
Equal cells in both 17 30 131


So which one is right? I personally trust Hansens data a lot more. Especially because I found them to be pretty consistent in my area of study. For me the ALOS PALSAR data is not useable yet until the authors have figured out ways to improve their classification. It can be concluded that users should not forget that those Forest Cover products are ultimately just the result of a big un-supervised algorithm who doesn’t discriminate between right and wrong. Without validation and careful consideration of the observer you might end up having wrong results.

Global Forest Change data now available for Download

The previously reported Global Forest data from Hansen et al. (2013) is now finally up for download.

Access the data here, but beware of the size of the individual granules as they easily be some gigabytes. The time for some awesome analysis and probably a bunch of papers has come…


Google Maps routing for QGIS

Just a little post with something not totally related with Ecology, but nevertheless quite useful for the daily work with GIS. Some time ago i got hand on a nice Garmin GPS device (Etrex 30) and i am constantly playing around with the options and opportunities for it. Especially in the interaction framework of fieldwork and desktop-based GIS processing. Now i was at the lucky situation to have some time available to go birding in the area of Vestamager southwest of Copenhagen, Denmark. My intention was to bicycle there and use the GPS for orientation (although i perfectly know the route 🙂 ).

Thus i wrote this simple little R processing script (see a general introduction how to create R-scripts for processing here), which uses the route(…) function of the ggmap package to generate line layers from a point x to point y. Note, that by using this function you are agreeing to the Google Maps API Terms of Service and you are only allowed to send 2500 queries per day.

Generate the route from x to y and specify bicycle as output.

Generate the route from x to y and specify bicycle as output.

The generated output line is automatically loaded into QGIS after processing and has the total length and duration of the trip in its attribute-table.

Queried route for Copenhagen with googlemaps displayed on an OSM background in QGIS

Queried route for Copenhagen with googlemaps displayed on an OSM background in QGIS

To use the script, create a new one in the processing toolbox and copy the contents below into it. Then copy it into your “~/.qgis2/processing/rscripts” folder. I will also post the script in the QGIS scripts section here on this blog.

# GGmap routing script by Martin Jung
# Homepage: https://conservationecology.wordpress.com/
##Vector processing=group
##x = string Copenhagen, Denmark
##y = string Berlin, Germany
##type= string driving
##output = output vector
if (!require(ggmap)){print("ggmap not installed. Will install it now");install.packages(ggmap, dependencies = TRUE)}

r <- route(from=x,to=y,mode=type,structure="route",output="simple",alternatives=F,messaging=F) # get the route
cs <- CRS("+proj=longlat +datum=WGS84 +no_defs") # WGS84 projection

l <- Lines(list(Line(r[c("lon","lat")])),ID=paste0(type,"_track"))
sl <- SpatialLines(list(l),cs)

data <- data.frame(from=x,to=y,type=type,length_km=sum(r$km,na.rm=T),duration_h=sum(r$hours,na.rm=T))
output <- SpatialLinesDataFrame(sl,data=data,match.ID=F)

To use existing shapefiles from within QGIS, users are advised to take a look at the pgrouting extension for Postgis. See a nice tutorial for installation and configuration on windows machines here.

Interesting Paper and Data: Global Forest Loss (Hansen et al. 2013)

Just some moments ago (on the 15th of November 2013, so not yet in Europe 😀 ) a new very interesting paper came out in Science presenting a new High-resolution dataset of global forest loss in the last decade. It is written by the authors of already existing widely used datasets, such as GLCF and CARPE, in corporation with Google.

Hansen et al. (2013), High-Resolution Global Maps of 21st-Century Forest Cover Change

“In this study, Earth observation satellite data were used to map global forest loss (2.3 million square kilometers) and gain (0.8 million square kilometers) from 2000 to 2012 at a spatial resolution of 30 meters. The tropics were the only climate domain to exhibit a trend, with forest loss increasing by 2101 square kilometers per year.”

Not surprisingly only the tropical rain-forest was found to posses a negative trend in the last decade. Most likely because of shifting agriculture, clash-and-burn practices and large scale deforestations due to monoculture cultivation of Palmoil, Soy and Jatropha. Quite impressively they used over  654,178 Landsat images in a time-series analysis to in characterize forest extent and change in the time from 2000 to 2012. Google apparently helped out with the computation (cloud-computing). And the best thing is: You can view the data

Google Viewer of Hansens et al. 2013 forest loss dataset

Google Viewer of Hansens et al. 2013 forest loss dataset

online via Google.

Online view of Global Forest Change

At the bottom you can find a Data download link, which makes the whole thing very interesting for follow-up research! Check it out!

EDIT: Apparently download is not yet available for the public. Will keep you updated!

Sussex Research Hive

Supporting the research community at the University of Sussex

Small Pond Science

Research, teaching, and mentorship in the sciences

Landscape Ecology 2.0

intersecting landscape ecology, open science, and R


The Research Blog of IIASA

Jörg Steinkamps Blog

Mainly things about R, Linux and vegetation modeling

Amy Whitehead's Research

the ecological musings of a conservation biologist

Michael McCarthy's Research

School of BioSciences, The University of Melbourne

The Rostrum

science, statistics, policy and more


Environmental Change - Understand, Predict, Adapt

Dynamic Ecology

Multa novit vulpes


METeorological Visualisation Utilities using R for Science and Teaching

A Birder´s Blog

"Everybody loves what they know"


A new metric to quantify biodiversity response to fragmentation

Trust Me, I'm a Geographer

Using Technology to Explore Our World

Duncan Golicher's weblog

Research, scripts and life in Chiapas