4.1. Introduction

In this tutorial, you will use GDAL to pre-process data from different sources to use in PCRaster-Python. GDAL is a software library for creating and manipulating geospatial data.

PCRaster operations work on raster datasets which have the same grid size and have the same spatial extent. GIS-data from different sources need to be clipped to the same region and resampled to the same grid size for usage in PCRaster models. Furthermore, when combining maps with a different map projections, maps have to be projected to a common map projection.

Besides GDAL and PCRaster-Python, you will use QGIS to visualize results. Both GDAL an QGIS needs to be installed on your computer, in addition to PCRaster-python. If you have QGIS on Windows, this comes with the OSGeo4W Shell. This is a set of command-line tools with all the required GDAL commands. Installation instructions for PCRaster-Python are available at http://www.pcraster.eu/, QGIS is available at http://www.qgis.org/

Instructions in this tutorial are focussed on using OSGeo4W on Windows. When using linux, add .py to the following commands: gdal_merge, gdal_edit and gdal_calc.

An extended manual of the different utilies in GDAL is available at: http://www.gdal.org/gdal_utilities.html. You will need this manual for the assignments.

For this tutorial, you need several datasets. These datasets are available in a .zip-file on the Blackboard community of this course. Download and extract the .zip-file associated with this tutorial.

4.2. Merge, Crop en Project Elevation Model

4.2.1. Merge DEM tiles

In this section, you will use DEM tiles to create a single elevation model for a catchment in the Alps. The data used comes from the Alos World 3D DEM dataset. This is an elevation model at 30 m resolution that is available for the entire world. This dataset can be downloaded as indicidual tiles that together cover the world.

Open an OSGeo4W shell (or another terminal window with access to GDAL). In the shell, browse to the DEM/Tiles Folder and display the files in this folder:

cd C:\Exercises\dataprocessing\ <Enter>
dir DemTiles <Enter> [on Windows]
ls DemTiles <Enter> [on linux]

Adjust the folder location after cd to where the data is located. This will show a list of files in the folder. The folder contains the following files: N046E008_AVE_DSM.tif, N046E007_AVE_DSM.tif, N045E008_AVE_DSM.tif, and N045E007_AVE_DSM.tif. These files are already a selection of the original dataset to save space.

Open QGIS and start a new project. Add all four files to the project, these tiles are next to each other. In this tutorial, you will make a model for the Anza Catchment in Italy. The contour of this catchment is available in the file anzacatchment.geojson. Add this file to QGIS. You will see that the catchment is on the intersection of the four tiles.

Using the mouse-pointer and coordinte box at the bottom of the screen in QGIS, you can lookup coordinates on the map.


Question: Tiled datasets have systematic filenames, which makes it easy to retreive the right data for the right location. The filenames contain a latitude (N…) and longitude (E…). What coordinates do the nummers of the filename represent?

  1. center of the tile

  2. lower-left hand corner

  3. upper-left hand corner

  4. lower-right hand corner

  5. upper-right hand corner

Correct answers: b

Feedback: The filename of the tiles indicate the lower-left hand corner of the tiles. Since adjecent tile-names increase with 1, all tiles have a size of 1x1 degree.


Switch back to the command prompt and merge the DEM-files using GDAL with the following command (on a single line). This command has to be executed from the dataprocessing folder: :

gdal_merge -of GTiff -o dem-merged.tif DemTiles/N046E008_AVE_DSM.tif
DemTiles/N046E007_AVE_DSM.tif DemTiles/N045E008_AVE_DSM.tif
DemTiles/N045E007_AVE_DSM.tif <Enter>

In this command, -of GTiff defines the output format of the merged DEM dataset. In this case GeoTiff, a widely used GIS dataformat. -o dem_merged.tif states the location and name of the output file. The last part of the command is list of filenames to be merged.

Now, add the merged file to QGIS to see the result of the merge operation. Adjust the visualisation to your liking.

4.2.2. Define target coordinate system

The next step is to crop the DEM to the modelling area (the catchment). Furthermore, the provided DEM is stored in a global coordinate system, WGS84. For modelling of smaller areas, it is more convenient to work in a local coordinate system. Both cropping and projecting can be done with one command.

First, we must determine which coordinate system to use for the final maps. For this tutorial, we will be using a UTM projection. This projection is available everywhere on Earth, but with different projection zones for different locations on Earth. To determine the right zone for our catchment, add the file UTM-zones-EPSG-codes.geojson to QGIS, and adjust the layer style so the polygons are transparent.

The UTM-Zones dataset shows all available UTM projections in the world on the map. If you zoom out to show a large part of Europe, you see there are adjacent strips on the map. The best projection to use is the one that contains the research area. For large areas that cover multiple UTM zones, other coordinate systems are better suited.

Use the identify tool in QGIS to determine the UTM zone. The UTM zone consist of a number and a letter (N or S) indicating if you are on the northern or southern hemisphere. Additionally, the UTM-zone dataset contains the associated EPSG identifier, which we use later to project the data.


Question: In which UTM-zone is the Anza Catchment located?

  1. 31N

  2. 32N

  3. EPSG:32632

  4. EPSG:32631

Correct answers: b

Feedback: The Anza catchment on the Italian-Suisse border is located in UTM zone 32, and is on the northern hemisphere, hence 32N. The EPSG identifier of this map-projection in EPSG:32632


4.2.3. Define target extent

Next, we need to convert the extent, or bounding box coordinates, of our catchment to the UTM coordinate system. So, we need to convert the lower-left and upper-right coordinate to the new coordinate system. In QGIS, you can see the coordinates of the mouse pointer in the Coordinate-box at the bottom of the screen. For the Anza Catchment, the lower-left and upper-right coordinates are approximately: 7.855, 45.895 and 8.265, 46.045, respectively. These coordinates are in the WGS84 reference system.

We can convert these coordinates to UTM coordinates using the program cs2cs, which is part of the GDAL command line tools:

cs2cs +proj=latlong +datum=WGS84 +to +init=EPSG:32632 <enter>

This command start a cs2cs session, in which you can enter coordinate-pairs. In this case (as defined by the above command), coordinates will be projected from Latitude-Longitude format (WGS84) to UTM ZONE 32N coordinates. Continue by entering the coordinates:

7.855 45.895 <enter>
8.265 46.045 <enter>

Close the cs2cs program by pressing CTRL-C simultaneous. The program has outputted two sets of coordinates. Note these two sets of coordinates. These consist of three values, but we will only need the first two (x and y). The last is elevation, which we do not use. The UTM coordinates are in meters. At a catchment-scale using no decimals is accurate enough.


Question: What is the y-coordinate in UTM Zone 32N for the coordinates 8.265, 46.045?

  1. 411173

  2. 5083019

  3. 443133

  4. 5099310

Correct answers: d

Feedback: If you got other results, check command and numbers.


4.2.4. Project and crop DEM

Now we know the coordinate system and bounding box of the target dataset. Next, we will project the merged DEM to the new coordinate system, and crop the DEM to our research area. We do this by using the GDAL command gdalwarp.

Analyse the following command:

gdalwarp -of gtiff -t_srs EPSG:32632 -te .. .. .. ..
-tr 90 90 -tap -r average dem-merged.tif dem-cropped-utm.tif <enter>

In this command, -of gtiff defines the output format (as before, we use GeoTiff), -t_srs EPSG:32632 defines the target coordinate system. -te .. .. .. .. defines the extent coordinates of the final dataset. Here you have to enter the coordinates of the bounding box as defined before with cs2cs. In the GDAL manual for gdalwap (http://www.gdal.org/gdalwarp.html), determine how to input the coordinates for the ‘-te’ argument.


Question: What values do you have to insert after -te .. .. .. .. in the gdalwarp command?

  1. 7.855 45.895 8.265 46.045

  2. 7.855 8.265 46.045 45.895

  3. 411172 5083019 443134 5099310

  4. 411172 443134 5099310 5083019

Correct answers: c

Feedback: By default, coordinates have to be entered in the target coordinate system, so with UTM coordinates. The syntax for the -te argument is ‘-te xmin ymin xmax ymax’, So these are the x- and y-cordinates of the lower-left hand corner followed by the coordinates of the upper-right hand corner.


The argument -tr 90 90 set a fixed cell size to 90x90 m. The argument -tap forces the output extent to a multiple of the grid size, this is important for PCRaster later as multiple input files need to be exactly aligned and of the same size. -r average tells gdal to average values during interpolation. This smooths the results a bit, but prevents artefacts in the final DEM.

Finalize the command above with the coordinates. Add the file dem-cropped-utm.tif to QGIS, and remove the larger DEM and DEM tiles if they are still present. If all went fine, the catchment boundary should fall just within the new DEM.

4.2.5. Convert DEM to PCRaster format

To use the DEM in PCRaster, it must be converted to the PCRaster file format. As stated in the introduction lecture on PCRaster, the way values are stored in a PCRaster map is very important. When converting to this format, the type of the data has to be defined explicitly by setting the so-called valuescale.

Enter and analyse the following command that converts the cropped DEM to the PCRaster format:

gdal_translate -ot Float32 -of PCRaster -mo PCRASTER_VALUESCALE=VS_SCALAR
dem-cropped-utm.tif dem.map <enter>

In this command, three things are important. We set -of (Output Format) to PCRaster, but the combination of -ot and -mo is different for different types of data. The following table shows which combinations of -ot and -mo you need for the various types of data that can be used in PCRaster. Since elevation data is scalar data, we use Float32 and VS_SCALAR.

Type of Data

-ot

-mo PCRASTER_VALUESCALE=…

-r

Boolean

Byte

VS_BOOLEAN

near

Nominal

Int32

VS_NOMINAL

near

Ordinal

Int32

VS_ORDINAL

near

Scalar

Float32

VS_SCALAR

average

The new pcraster map is accessible from PCRaster-Python, and we can view the map using aguila. Create a testdem.py file in the dataprocessing folder were dem.map is located, and add the following lines:

from pcraster import *
dem = readmap("dem.map")
slopemap = slope(dem)
aguila(slopemap)

If required use commands like import os and os.chdir(r"C:\Exercises\dataprocessing"). Run this file to display the slopemap of the catchment. In the following sections of this tutorial, you will create more input maps for the same research area to run a forest fire model.

4.3. NoData and Clone Maps

4.3.1. Remove values outside catchment

The DEM is a rectangle that is a little larger than the catchment, and there are still values outside the catchment. Using gdal, we can use the polygon of the catchment to wipe out all values outside the catchment. We use the command gdal_rasterize and gdal_edit to update the DEM:

gdal_rasterize -burn -32768 -i -at anzacatchment.geojson dem-cropped-utm.tif <enter>
gdal_edit -a_nodata -32768 dem-cropped-utm.tif

The gdal_rasterize uses the catchment polygon (anzacatchment.geojson) to edit the DEM. A value of -32768 is written to the map outside the polygon. The gdal_edit command sets this value as the ‘nodata’ value, indicating this part of the raster file contains no data.


Question: What is the purpose of the -i argument in the gdal_rasterize command?

  1. To edit the values outside the catchment polygon.

  2. To edit the values inside the catchment polygon.

  3. To remove the previously set nodata value

  4. To set the nodata value to -32768

Correct answers: a

Feedback: The option -i stands for invert. By default, gdal_rasterize edits the values inside the polygon. By using -i values outside the catchment polygon are overwritten with the new value of -32768.


Open the cropped dem again in QGIS, and remove the previous version. Run the testdem.py script again to see the result on the slopemap.

4.3.2. Create a clonemap

It is considered good practice to have a clonemap of your catchment for PCRaster scripts. Such map is a Boolean map with True-values inside your catchment and NoData-values outside your catchment.

In this tutorial, we first create a map with all zeros. We do this by multiplying the DEM with value 0. By using the DEM as a starting point, we ensure the new map is exactly of the same size as the DEM. This is essential of PCRaster modelling. First close any open maps in QGIS, then execute the following command:

gdal_calc --type Byte -A dem-cropped-utm.tif --outfile=clonemap.tif
--calc "A*0" --NoDataValue=0 <enter>

Then, similarly as in the previous section, we burn a value of 1 inside the polygon of the catchment:

gdal_rasterize -burn 1 -at anzacatchment.geojson clonemap.tif

For visualisation purposes, gdal can pre-compute statistics of the dataset. This step is required to correctly visualize the created clonemap.tif in QGIS:

gdal_edit -stats clonemap.tif

Finally, we need to convert this raster file to a PCRaster format, using the command gdal_translate. Like before, we need to select the right parameters for -ot and -mo.


Question: For the clone map, which are the right parameter values for -ot and -mo in the gdal_translate command?

  1. -ot Byte, -mo PCRASTER_VALUESCALE=VS_BOOLEAN

  2. -ot Boolean, -mo PCRASTER_VALUESCALE=VS_NOMINAL

  3. -ot Byte, -mo PCRASTER_VALUESCALE=VS_NOMINAL

  4. -ot Boolean, -mo PCRASTER_VALUESCALE=VS_BOOLEAN

Correct answers: a

Feedback: A clonemap in PCRaster is stored as a Boolean map. In PCRaster, the output type (-ot) of such map is Byte, and the corresponding PCRASTER_VALUESCALE is VS_BOOLEAN. Note, that the output type Boolean does not exist! See the table earlier in this document.


Complete the following command by replacing … with your answer from the previous question:

gdal_translate -ot ... -of PCRaster -mo ... clonemap.tif clone.map

Adjust the content of the script testdem.py to read and display clone.map.


Question: In the clone.map, as shown in aquila. What are the values inside the catchment, and what are the values outside the catchment?

  1. inside: true, outside: false.

  2. inside: 1, outside: 0.

  3. inside: true, outside: no data.

  4. inside: 1, outside: no data.

Correct answers: c

Feedback: The clonemap has no data outside the catchment and the value ‘true’ inside the catchment.


4.4. Forest Fire Model

In the chapter ‘Dynamic modelling’, you have used a simple forest-fire model on an arbitrary map. In this section, you will use the data from the previous sections and additional data to model forest fire in a real catchment.

4.4.1. Global land cover data

In this section, we will convert two datasets to the same extent and projection as the dem we created earlier. The procedure is the same, but we have to deal with different data types again.

The two datasets we will use are classified land cover data from the Global Land Cover (GLC30) dataset, and tree cover fraction data from the USGS Global Land Cover datasets. Although these datasets have a similar name, they are maintained by different instituted (see the section at the end of this chapter with information on the used data).

First, we will project and crop the available tiles to our catchment. We use the same extent and coordinate system as earlier for the DEM:

gdalwarp -of gtiff -t_srs EPSG:32632 -te 411172 5083019 443134 5099310
-tr 90 90 -tap -r near GlobalLandCover/n32_45_2010lc030.tif landcover-crop.tif <enter>

gdalwarp -of gtiff -t_srs EPSG:32632 -te 411172 5083019 443134 5099310
-tr 90 90 -tap -r average LandCover/50N_000E_treecover2010_v3.tif treecover-crop.tif <enter>

Open the tiles landcover-crop.tif and treecover-crop.tif in QGIS.


Question: What are the data types of landcover-crop.tif and treecover-crop.tif?

  1. landcover: Nominal, treecover: Ordinal

  2. landcover: Scalar, treecover: Scalar

  3. landcover: Scalar, treecover: Ordinal

  4. landcover: Nominal, treecover: Scalar

Correct answers: d

Feedback: The landcover dataset contains classified data (e.g. ‘Forest’ or ‘Grassland’), represented by a number. Such scale is a Nominal Scale. The treecover dataset contains fractional values (between 0 to 1) of the fraction of trees. Such continuous scale is stored as Scalar values.


Based on the datatypes of the different datasets, complete the following commands to convert the landcover and treecover datasets to PCRaster maps:

gdal_translate -ot ... -of PCRaster -mo PCRASTER_VALUESCALE=...
landcover-crop.tif landcover.map <enter>

gdal_translate -ot ... -of PCRaster -mo PCRASTER_VALUESCALE=...
treecover-crop.tif treecover.map <enter>

4.4.2. Biomass and land use

There are a few updated aspects of the forest fire model, because we have access to the landcover and treecover data. We can now use a spatially variable probability for a cell catching fire, based on the land cover. For example, forest gets a higher probability of catching fire than shrubs. Secondly, the treecover dataset is used as a proxy for biomass, which diminishes when a cell is on fire. The lower the biomass, the higher the probability that fire stops in that cell.

The updated model is available in the python script fire.py. Several report functions are commented out in this script to prevent an overload of model output. Feel free to change parts of the model. Comments in the dynamic section with the text ‘Updated’ indicate parts of the model that are new compared the model you used earlier

4.4.3. Fire start location

Have a look at the initial section the fire.py script. From all the .map files required by the self.readmap commands, we already created all expect firestart.map. This map is a Boolean map with only one True cell, which indicates the starting position of the fire. The file firestart.shp in the folder FireStart contains a vector point with a fictive starting location of our forest fire.

This point dataset can be converted to a raster with similar commands as before. We convert the DEM to all zeros using gdal_calc, then burn in a value of 1 at the location of the point:

gdal_calc --type Byte -A dem-cropped-utm.tif --outfile=firestart.tif --calc
"A*0" --NoDataValue=0 <enter>
gdal_rasterize -burn 1 -at FireStart/firestart.shp firestart.tif <enter>
gdal_edit -stats firestart.tif <enter>
gdal_translate -ot Byte -of PCRaster -mo PCRASTER_VALUESCALE=VS_BOOLEAN
firestart.tif firestart.map <enter>

4.4.4. Land use to fire probability

The final file we need is the landcover-fireprob.txt file, used in the lookupscalar command. This file is lookup-table that converts the landcover map to probabilities of catching fire. The landcover dataset contains the following values, which associated land use types:

nr

Land use

10

Cultivated land

20

Forest

30

Grassland

40

Shrubland

50

Wetlands

60

Water bodies

70

Tundra

80

Artificial surface

90

Bare land

100

Permanent snow/ice

To show which classes are present in the study area, open the landcover map in aquila:

aguila landcover.map <enter>

Create a text-file with the name landcover-fireprob.txt using a text editor (e.g. notepad on Windows or leafpad or gedit on Linux). In this file, create a lookuptable to convert all the landcover classes to probabilities of catching fire. Below in an example / start, complete this text file yourself. Several class get a value of 0 (like snow and water) and more forested land types get a higher value (approximately 0.2 at the most).

10  0.05
20  0.2
30  0.05

If your model is finished, run the script. The model creates timeseries, which can be visualized using aguila from the command-line. Note that the OS4GeoW does not by default include the pcaster command, so you may need to run these commands from a command-line terminal which does support pcraster. For example:

aguila --timesteps=[1,200,1] fire <enter>
aguila --timesteps=[1,200,1] tree <enter>

4.5. Common errors

This list shows a few common GDAL-errors, and how to solve them.

ERROR 1: Output dataset data.tif exists

Cause: the file is already there. Remove the file first with the command del data.tif. If that does not work, the file is likely still open in a software package like QGIS. If that is the case, close the file first and try again.

ERROR 4: data.tif: No such file or directory

Cause: the file you are trying to read does not exist. Probably a typo in the name, or you are not in the right folder.

4.6. Data sources

The following data sources have been used in this tutorial. These are all global datasets at 30 m resolution which are free to use to create your own model. In most cases, you should register for an account, and the data comes in small tiles.