3.4. Functions calculating statistics over realizations

3.4.1. Mean and variance, static data

In the folder framework, open real.py. It is the script created in the previous exercise drawing realizations from stochastic variables. PCRaster includes functions to calculate statistics over these realizations. This is done in the postmcloop. These functions read the data written to disk in the initial or the dynamic, and calculate the statistics over these data, writing the results to disk.

The function mcaveragevariance calculates the mean and variance of realizations, on a cell-by-cell basis.

Save real.py under the new name statistics.py. Add the following lines to the postmcloop:

names=['b']
sampleNumbers=self.sampleNumbers()
print sampleNumbers
timeSteps=[0]
mcaveragevariance(names,sampleNumbers,timeSteps)

The mcaveragevariance function requires three inputs:

  • names is a list of filenames (given as strings) of the map variables for which the statistics are needed. As these are read from disk, you need to have stored these in the initial or the dynamic using report. Here we use a list of length 1, containing one variable.
  • sampleNumbers is a list of Monte Carlo samples over which statistics are calculated. Here we create it with the python function range. It is also printed to check it.
  • timesteps is a list of time steps for which the statistics are created. As b represents a static variable, we provide a list containing one zero element. This tells mcaveragevariance that it gets static data.

After adding the lines, change the number of Monte Carlo samples to use to 100, for more precision. Save the script and run. Type dir in the main folder framework. The mcaveragevariance function has created the maps b-ave.map and b-var.map, containing the mean and variance. Display these maps.

Now, change names to calculate the statistics of vegetation height also. Save and run the script. Type dir in the main folder framework to see what it has stored. Display the mean and variance maps of vegetation height.


Question: How well does your Monte Carlo script calculate the original statistics of the stochastic variables ?

  1. For the variable b, the variance is approximately between 50% and 150% of the provided value.
  2. Idem, between 10% and 190%.
  3. Idem, between 1% and 1000%.
  4. Idem, between 99% and 101%.

3.4.2. Percentiles, static data

The calculation of percentile data is almost just as easy.

Open statistics.py and add the following two lines to the postmcloop, below the other statements:

percentiles=[0.025,0.05,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.95,0.975]
mcpercentiles(names,percentiles,sampleNumbers,timeSteps)

Save the script and run. The functions percentiles works similar to mcaveragevariance, with the exception that it needs a list providing the percentiles that need to be calculated. In the framework folder, type dir to see what the script has stored.

Create a probability density distribution plot of the vegetation height and the variable b. Click on the map to retreive the plot for different cell locations.


Question: What is the 25% percentile value of b? What is the probability that b exceeds 1.0?

  1. The 25% percentile value is about -0.05, the exceedance probability for 1.0 is about 0.2 (values depend on cell location).
  2. The 25% percentile value is about -0.5, the exceedance probability for 1.0 is about 0.2 (values depend on cell location).
  3. The 25% percentile value is about -0.01, the exceedance probability for 1.0 is about 0.2 (values depend on cell location).
  4. The 25% percentile value is about -0.005, the exceedance probability for 1.0 is about 2.0 (values depend on cell location).

3.4.3. Dynamic data

A very similar approach can be used when calculating statistics over dynamic data.

Open statistics.py. Let’s first create a more interesting dynamic variable. At the top of the dynamic, type:

c = normal(1)
self.report(c,'c')

At the bottom of the dynamic, type:

self.d=self.d+c+1
self.report(self.d,'d')

Initialize self.d in the initial by adding:

self.d=scalar(0)

Also, increase the number of timesteps to 10. Save statistics.py and run the script. Type dir in the folder 1 to see what is stored.

In the main folder framework animate a visualisation of 12 realizations of c and d. Use the Aguila options --scenarios, --timesteps and --multi.


Question: What happens with the uncertainty in c and d over time?

  1. The uncertainty in c and d increases with time.
  2. The uncertainty in c and d decreases with time.
  3. The uncertainty in c is time-independent, while the uncertainty in d increases with time.
  4. The uncertainty in c increases with time, while the uncertainty in d is time-independent.

To calculate the sample statistics, add the following block of code at the bottom of the postmcloop:

names=['c','d']
sampleNumbers=range(1,self.nrSamples()+1)
timeSteps=self.timeSteps()
print timeSteps
mcaveragevariance(names,sampleNumbers,timeSteps)
mcpercentiles(names,percentiles,sampleNumbers,timeSteps)

Note that this is very similar to the code needed for static data. We only need to change the names and the timeSteps. Note that self.timeSteps() generates a list of the timesteps in the model. It is printed here.

Save the script and run. It will take some time to run as it needs to calculate statistics for each time step.

Type dir c* to see what is stored by the script. Also try dir d*.

Create probability distribution plots of c and d. Also, plot time series and animate.


Question: Set the map view of d to retrieve the exceedance probability of a threshold value of 5. What happens with this probability over time?

  1. It increases over time up to a value of about 0.01 at the last time step.
  2. It increases over time up to a value of about 0.1 at the last time step.
  3. It increases over time up to a value of about 1 at the last time step.
  4. It increases over time up to a value of about 0.5 at the last time step.