Tutorial: Graphical User Interface for Simulations of Chronic Oil Pollution in the German Bight
Description of the problem
Fig. 1: Graphical User Interface
Here we go! Proceed directly to the application.
Introduction Bayesian Network
This tutorial describes the features of a graphical user interface (GUI) that allows for an interactive exploration of key results from a large ensemble of oil spill simulations based on multi-decadal model based reconstructions of the North Sea hydrodynamic regime. These hydrodynamic simulations were taken from the data base coastDat.
coastDat
The GUI bases on the probabilistic representation of a passive tracer drift climatology. Although no oil fate modelling was employed, monitoring of travel times in each simulation offers the option for offline re-weighting of simulated coastal pollution according to an assumed half-life of oil. Conditional probabilities within a network of causally linked variables were calibrated based on the huge number of detailed simulations of hypothetical oil spills under the whole spectrum of different weather conditions that occurred within a period of about five decades.
Bayesian Network (BN) technology was applied to allow the user to interactively study the dependence structure of key variables involved. In an interactive mode the user may experience to which extent conditioning on either special seasons of the year or oil properties (its half-life), for instance, affects probability distributions of all other parameters represented in the network. The BN underlying the GUI refers to probability tables pre-calculated from the comprehensive set of detailed oil spill simuations. This indirect approach is very fast as it does not require direct access to the original data.
Fig. 2 below presents the structure of the BN that underlies the above GUI. Each node of the BN represents a variable with a corresponding probability distribution, arrows between nodes describe how variables interact. The core of our implementation is based on the SMILE reasoning engine for graphical probabilistic model contributed to the community by the Decision Systems Laboratory, University of Pittsburgh.
SMILE (Structural Modeling, Inference, and Learning Engine)
In the graphical user interface,
Fig. 2: Structure of the Bayesian Network that underlies the GUI.
the generic presentation of variables was replaced by more customized panels. All arrows representing the logical context were discarded from the display. Nodes for source and receptor regions were combined into one geographic map in the center of the GUI. Alternate displays of the two probabilities in the BN referring to receptor regions being hit by passive tracer particles and particles with an assumed finite half-life, respectively, occur in the same panel of the GUI, depending on whether or not a half-life was specified by the user. Prior assumptions on the distribution of source strengths are represented by a drop down list below the geographical map. Panels on the right hand side of the GUI represent meteorological forcing and information on the season of the year. More details about each variable will be provided in the following.
Variables in the Graphical User Interface (Basic State)
Source and Receptor Regions
The geographical map in the GUI displays the German Bight, which is the area of interest in this study. For your better orientation you may click on '+' at the panel's right hand side to see a larger scale general map of the whole North Sea region. Coordinates of the cursor's present position occur at the panel bottom.
In the study hypothetical oil spills are assumed to occur within nine different source regions located along the main shipping routes in the German Bight. In the map these source regions are represented by orange-coloured boxes. Labels S1-S9 pop up when positioning the cursor over any of the regions. For each source region the probability of an oil spill being located in it is shown. Prior probabilities in the initial state of the GUI (or the underlying BN) are uniform, i.e. 11(=100/9)%.
For the assessment of coastal pollution we distinguish between five receptor regions covering the German North sea coast. Green boxes represent these regions labelled T1-T5 (again labels become visible after positioning the cursor over a particular region). Below the map the percentage of released material that on the mean would reach the German coast as a whole is specified. As a default the system assumes a persistent pollutant (infinite half-life), which results in an expected pollution rate of 54% when releases are uniformly distributed among all source regions. Percentages shown for individual receptor regions always add up to 100%, i.e. they describe the relative allocation of stranded material between the five coastal areas.
Drift Time
The upper left panel displays the distribution of drift times between source and receptor regions. The histogram classifies values up to 60 days (maximum length of drift simulations) into 6 categories of different widths. Travel times are always analysed from the subset of those trajectories that reach the receptor regions the user is interested in (cf. the percentage specified below the geographic map). In the basic state of the system with no restrictions entered this would be the whole German North Sea coast.
Half-life of the Pollutant
The ensemble of drift simulations used for calibration of the BN did not take into account any depletion by evaporation or chemical/biological processes. The existence of "Drift Time" in the Bayesian network, however, allows for the posterior blending of passive tracer simulations with an assumed half-life of the pollutant. Basically, this is done by properly weighting simulations with different drift times. The GUI offers the user 6 choices for specifying half-life. Default used in the basic state is the value infinite that corresponds with a disregard of all depletion processes.
It should be mentioned that in the GUI half-life is treated differently form all other variables. Originally in the BN half-life is a random variable with a corresponding probability distribution. Any evidence on travel times and pollution rates would have effect on the estimated distribution of half-life. In the GUI, however, the situation of having no direct evidence on a pollutant's half-life was replaced by the choice of evidence 'infinite' as a default. This simplifies interpretation of results substantially.
Season
The annual cycle is represented by four states of the variable season: spring (March-May), summer (June-August), autumn (September-November) and winter (December-February). Due to identical lengths of the four seasons, the prior probability distribution for the variable season is uniform. Provided the user entered evidence on any variable in the network, however, an altered conditional probability distribution of season might indicate a clustering of corresponding events at a specific time of the year. As an example, one might assume that pollution in some specific coastal area was observed. Even assuming that the German coast as a whole is affected would produce a seasonal signal. Unfortunately, however, the option for a combined selection of several (or even all) receptor regions is presently not provided in the GUI.
Dominant Wind Direction
This variable tries to summarize wind conditions that prevailed during drift simulations. Weigting factors were assigned to wind directions according to the lengths of time intervals during which the winds prevailed. Kind of filtering resulted from a) taking into account only those three directions that occurred most frequently and b) concentrating the evaluation on just the first three weeks of a given simulation.
The impossibility to fully characterize ever-changing wind conditions during a 21 (or even 60) day period by just a few numbers makes conditioning pollution on certain winds difficult. Selection of some wind direction means to concentrate on simulations during which this wind direction prevailed for a reasonable length of time. Other wind directions will have occurred as well and may even have been more decisive for the overall drift behaviour. It is important to note, however, that the remaining part of the BN remains unaffected by this fuzzy representation of wind conditions, as calibration of the BN was always based on the full complexity of the hydrodynamic simulations.
Dominant Wind Speed
Calibration of wind speed was done in analogy with calibration of wind direction. All limitations discussed in the previous paragraph apply here as well.
Conditioning by Entering Evidence
Initially (see previous section) all probability tables in the GUI describe marginal distributions that were obtained from the full set of drift simulations. Assume now that we are interested in either the effects of oil releases at one particular location or the risk exposure of one particular receptor region. Biological impact studies may need information for specific seasons. To give a last example, the seriousness of coastal pollution will depend on the released oil's depletion rate. In all cases (except the assumption of a half-life, see above) we would like to confine the analysis to a subset of drift simulations that satify certain constraints. For the BN this means to enter evidence on one or several variables represented in the GUI.
For entering evidence on source or receptor regions it is sufficient to just click on the region of interest. The region will turn red and indications of probabilities will disappear as allocation among several regions is no longer an issue. For source regions the occurrence of the red colour means that now all oil slicks being analysed originate from this source region (at present it is not possible to combine several source regions). For receptor regions the selection means that now the analysis is confined to oil particles that arrive in this region (again regions cannot be combined). It is possible to simultanously select one source and one receptor region. To see the effects of evidence entered, one must first trigger the propagation of information throughout the network by clicking on the button 'Calculate' below the panels.
For source regions there is also the option of changing the prior probability distribution. The default assumption of a uniform prior can be changed in a drop-down list below the geographical map. The alternative probability distribution was estimated from German aerial surveillance data (2000-2005). This prototypical implemention of an observed prior is biased, however, by not taking into account aerial surveillance data from other littoral states in the area.
To enter evidence for any variable other than source and receptor region, just click on the corresponding panel. You may then select any state from a drop-down list that opens. In the histogram your choice will be shown in the form of a red 100%-bar.
Each drop-down list allows also for the retraction of evidence from the respective node ('Reset'). A reset of source and receptor regions is possible via a drop-down list that opens after positioning the cursor over the geographical map and clicking the right mouse button.
Note that each entering or retraction of evidence needs the re-calculation of probability distributions by clicking on the button 'Calculate'. The only exception is the retraction of all evidence by clicking on either the 'Reset' or the 'Initial State' button.
General Buttons in the GUI
Below the geographic map there are a couple of buttons functions of which will be described in the following.
Calculate:
Clicking on this button triggers the re-calculation of all marginal probability distributions. This process of information propagation is needed each time evidence was entered or retracted.
Initial State:
Click on this button to return to the unconditional basic state of the network. Any evidence you may have entered will be removed. Note that evidence removal for individual variables is possible via drop-down lists that occur when clicking on the corresponding panels (in the geographic map you must use the right mouse button for this purpose).
Reset:
Same function as 'Initial State'. In addition, however, the history of all examples you saved will be deleted.
Save:
Click on this button to store a given state of the network. Note that clicking on 'Reset' will delete all states you previously stored.
Buttons << , < , > , >>:
These buttons allow for navigating through the set of screenshots previously stored. Clicking on '<' or '>' lets you step one figure back or forth, clicking on '<<' or '>>' will bring you to the very either first or last state you stored.
Help:
The button 'Help' is linked to this tutorial.
Initial State (Fig. 3)
The initial state of the GUI is made up by unconditional distributions for all variables involved (Fig. 3). These distributions were obtained from the full set of simulated hypothetical oil spills assumed to have occurred every 28 hours within the years 1958-2004. The parameter 'half-life' is set to infinite which implies passive tracer simulations. The probability distribution for source regions is uniform, i.e. no information on traffic density, for instance, is included.
Fig. 3: Initial State
According to Fig. 3, on the mean about 54% of a persistant pollutant released somewhere along the shipping routes is expected to reach the German coast (within the period of 60 days covered by the underlying drift simulations, with no degradation being taken into account). Percentages in the green boxes add up to 100% and describe the relative allocation of the stranded material to different receptor regions. It can be noticed that the northern part of the German coast would be most affected. This result is consistent with the prevailing of primarily westerly winds according to the wind rose shown on the right hand side.
It should be emphasized that all three panels on the right hand side of the display (season and wind conditions) are unconditional. As an example consider the uniform distribution of seasons. A priori each season is equally probable due to the seasons' identical lengths. One must not conclude from the uniform distribution in Fig. 3 that coastal pollution does not contain a seasonal signal. To see this, however, one must confine the analysis to situations in which the German coast was affected. This will be the first step of our example analysis.
Selection of Receptor T3 (Fig. 4)
Suppose that we are mainly interested in receptor region T3. We click into the corresponding green box and afterwards use the button 'Calculate' for updating probabilities. A screenshot of the resulting state of the GUI is shown in Fig. 4.
Fig. 4: Selection of Receptor Region T3.
The selected receptor region turned red. As we confined the analysis to dealing with material that arrives in region T3, percentages for the allocation to different receptor regions became meaningless and diappeared from the display. The line below the map informs us, however, that a total of 9% of the relased material is expected to reach T3. The wind rose has changed towards more northerly winds which appears reasonable considering the orientation of the selected coastal area.
The distribution of season is now no longer uniform. Instead the analysis reveals that pollution of the coastal area T3 is more probable in summer than in winter (for uniformly distributed sources and depletion effects being disregarded).
The possibly most important piece of information available from Fig. 4, however, is the conditional probability distribution of source regions. The uniform prior distribution is now replaced by a distribution that reflects both distances between source regions and T3 and directional preferences. The most probable source (for passive tracers) is region S4. Given that pollution was observed in region T3 and that no prior knowledge about the origin of this pollution exists, our estimated probability of pollution originating from this source region would be 22 %. In a next step we will check effects of a finite half-life of the oil spilled.
Selection of Half-life 20 days (Fig. 5)
We may anticipate that travel times from the most probable source region S4 will be shorter than travel times from other source regions. This hypothesis may be validated by clicking on different source regions and checking travel time distributions displayed after the re-calculation of probability tables (not shown). Accordingly, for substances with a short half-life (strong depletion), the probability of S4 being the source region must be expected to further increase.
Fig. 5 shows the GUI after a half-life of 20 days was selected and probabilities were re-calculated.
Fig. 5: Selection of half-life 20 days.
Unsurprisingly the probability of S4 being the responsible source region increased to now 31%. For a half-life of only 5 days a value of even 48% is obtained (not shown). From the distribution of travel times it can be seen that with depletion being included particles must be fast moving for having a chance to reach receptor region T3. Seasonal differences between summer and winter have changed (decreased) insignificantly.
Next let us further confine the analysis to the consideration of only one specific sector of the shipping route. We choose a rather extreme example by selecting the most western source region S1. The resulting display is shown in Fig. 6.
Selection of Source Region S1 (Fig. 6)
The choice of S1 has major impacts on the conditional probability distributions. Due to the large distance to be covered, travel times clearly tend to be higher. Winds more probably blow from the west and wind speeds are higher. The wind conditions needed explain a change in the seasonal probability distribution. Given an oil spill in S1 and pollution observed in T3, now winter is the most probable season.
Fig. 6: Selection of source region S1.
The final step in our example analysis is to assume a more effective oil depletion.
Selection of Half-life 5 days (Fig. 7)
A shorter half-life forces the distribution of travel times between S1 and T3 to shift towards much smaller values. Stronger winds blowing more from the west are prerequisite for this. This effect can be recognized from a comparison of the two wind related conditional probabilites in Fig. 6 and Fig. 7, respectively. It has already been discussed, however, that the representation of ever-changing wind conditions by few parameters is difficult and may be questioned. This may be the reason why the distribution of season as a proxy for prevailing weather conditions changed more substantially. In Fig. 7 winter and autumn are now the clearly most probable seasons (together 64%).
Fig. 7: Selection of half-life 5 days.
One final remark seems to be appropriate. Fig. 7 deals with relatively rare events. The large sample of simulations under atmospheric conditions that occurred within a couple of decades, however, provides a sound basis for a proper analysis also of such less frequent situations. This is one of the benefits of a data base like coastDat.
coastDat