Average rainfall 2001-2016, global tropics

Map: Average rainfall 2001-2016, global tropics

Framework key concepts

Thomas Gumbricht bio photo By Thomas Gumbricht

Contents

- Introduction - Processes and compositions - Processes - Compositions - Composition organization - File naming conventing - Hierarchical folder structure - XML coded parameterization - <userproj> - Defining the Framework superuser - <period> - <process> - Boolean tags for <overwrite>, <delete>, <acceptmissing>, and <update> - <parameters> - <srcpath> and <dstpath> - <srccomp> and <dstcomp> - Other tags

Introduction

Karttur’s GeoImagine Framework is built with object oriented classes and methods. The object oriented concept in programming means that all items belong to a class. Items belonging to a certain class have both properties (or attributes) and methods (processes or functions) associated with that class. In the Framework the most important objects are functions (called processes) and spatial data collections (called compositions). The parameterization of processes and compositions is encoded in JavaScript Object Notation (json) files. This post first summarizes the concepts of processes and compositions and then explains how this is translated to json structured codes.

Processes and compositions

All functionalities of the Framework are encoded in object oriented processes. Most processes operate on spatial data, but not all; processes are also used for building and managing the database and setting up the processes themselves.

Most processes, however, do use spatial data either as input (or source [src] data) and/or output (or destination [dst] data). Source data is usually denoted with the abbreviation src and destination data with dst. All data, regardless of type, belong to a composition.

Processes

There are hundreds of different processes defined within the Framework. You can get a list of all available processes from the top menu item sub processes.

Processes can be regarded as high level Geographic Information System (GIS) functions. The more basic processes are in fact nothing else than interfaces to standard GIS functions. Other processes represent sequences of standard GIS functions. And then there are functions, including for modeling and machine learning, that can not be found in standard GIS software packages. Compared to an ordinary GIS software package (e.g. ArcGIS, SAGA, QGIS, GRASS etc), Karttur´s GeoImagine Framework is much more demanding to learn and operate when starting,. But once you understand the Framework and have gained knowledge about the Python packages you use, you can add any spatial process you can think of. Another advantage with the Framework is that you can combine any number of processes and then run them for data over any region, or the entire Earth, in one go.

Compositions

A composition can be a single file, like the map of global countries, or thousands of files, like the red reflectance of tiles from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite sensor. When asking the Framework to do processes that involve data, the composition(s) to use must be stated. The processing will be done for all compositions falling within the defined spatial and temporal domain (this will become clear further down).

All compositions have an id that is composed of two parts, a thematic part and a content part, separated by an underscore:

theme_content

Neither the theme, nor the content are allowed to contain any underscore themselves. Each composition id is linked to a scale factor (scalefac), an offset add factor (offsetadd) a pre-defined numeric type (celltype [.e.g Byte, Int16, UInt16 etc]) a nodata value (cellnull), a data unit (dataunit), and must have a scale measure (n[ominal], o[rdinal], i[nterval)] or r[atio]).

A composition can contain different products as long as the scalefac, offsetadd, celltype, cellnull, dataunit and measure are identical. This means that two versions of the same data, for example derived with a slightly different algorithmic definition, can belong to the same composition. Also reflectance bands from different sensors can share the same composition, as can rainfall data from different sources. But normally data from different origins or sensors are also represented as different compositions. All compositions are identified by six (6) different attributes (these properties then define the folder hierarchy and dataset file naming - explained further down):

  • source
  • product
  • theme
  • layerid
  • prefix
  • suffix

The source is the origin of the dataset from which the composition is derived or extracted (e.g. the satellite platform, the organization or individual behind the data), product is usually a coded product identifier or the producer, theme is a thematic identifier (the “theme” part of the compid introduced above), layerid is a content identifier with prefix usually identical to layerid but can be set differently, and suffix is a more loose part that can be freely set but usually represents a version identifier.

Compositions as such do not relate to any spatial extent or temporal validity.

Composition datasets

While the composition as such does not entail any spatial data, all spatial data must belong to one, and only one, composition. All datasets that belong to a specific composition follow an absolutely strict naming convention. This means that all data that is either imported to, or produced as part of, the Framework are either fully or semi-automatically named. Alongside the strict file naming, also the hierarchical folder structure is strictly defined.

File naming conventing

All data file names are composed of five (5) parts of which three relate to the composition, and the remaining two to location and timestamp. In the filename, the parts are separated by underscore, (“_”), and the parts are not allowed to contain any underscore themselves:

  1. prefix of layer identifier (composition prefix)
  2. product or producer (composition product)
  3. location
  4. timestamp
  5. suffix (composition suffix)

All files in the Framework thus have the following general format:

prefix_product_location_timestamp_suffix.extension

Within each part, a hyphen (“-“) is used for separating different codes or labels. A hyphen in the timespan part, for example, denotes that the data in the file represent aggregated data for the period between two dates.

Hierarchical folder structure

All the data files are organized in a hierarchical folder structure with the following levels:

  • system
  • source (composition source)
  • division (tiles, region or mosaic)
  • theme (composition theme)
  • location
  • timestamp

The two lowest hierarchical levels, location and timestamp, are identical to the location and timestamp in the file name. The theme can be anything like “rainfall”, “elevation”, “landform” or other thematic identifier. division can only take three different values “tiles”, “region” or “mosaic”. The user can not set which of the three to use; that is linked to each process and coded in the database (put differently, the division is a fixed attribute of the process object). The source level identifies the source or origin of the data and is equal to the composition source, for instance “TRMM” for TRMM rainfall data. The top system level relate to the different projection and tiling systems for representing data that is included in the Framework (e.g. ancillary, modis, sentinel, ease2t, ease2s, seas2n, mgrs), plus one system (system) for some default data.

json coded parameterization

All Framework processes are initiated and parameterized using json coded instructions. All processes require some basic (common) instructions, and then also process specific instruction.

json project definition

Framework processes are only accessible from within a project. A project is defined by a user (userid), an id (projectid), a geographical region (tractid), that can be further subdivided at two more levels (siteid and plotid), and a projection system (system). All variables given must be predefined in the Framework database. The Framework spatial definitions are explained in further detail in this post.

In addition, a temporal period must be specified; for non-spatial data and for spatial data that can be regarded as static on human time scales (e.g. topography) the timestep is set to static, as in the example below. Also the postgres database to use must be defined.

{
  "postgresdb": {
    "db": "geoimagine"
  },
  "userproject": {
    "userid": "karttur",
    "projectid": "karttur-northlandease2n",
    "tractid": "karttur-northlandease2n",
    "siteid": "*",
    "plotid": "*",
    "system": "ease2n"
  },
  "period": {
    "timestep": "static"
  },
  "process": [
    {
      "processid": "processid",  
    }
  ]
}

In the above example, the userid is set to the system default superuser karttur. All project id’s (projectid) are forced to 2-part names separated with a hyphen. In the example, the second part (northlandease2n) denotes that the geographic region is the land mass of the northern hemisphere and that the system projection is ease2n (Equal-Area Scalable Earth (EASE) Grids, version 2, for the northern hemisphere). The system is thus also ease2n.

All the processes defined (under the process list clause “[]”) will be run for the geographical region (tractid) and temporal period (static) defined for the userproject, given that the identified user has these rights.

Defining the Framework superuser

The Framework default superuser is by default called karttur. The user karttur is also by default given a project that is also called karttur and that include a tractid that is also called karttur. karttur is thus the name of a user, a project and a global region (or tract). Thus defined, the user karttur has the rights to perform all processes available within the Framework.

The combination of user, project and tractid must be predefined in the postgres database. You can define any number of users, projects and tracts, but they must be linked. When you setup the complete Framework you can also change the name of the user, project and tractid for your superuser. That is done when you setup the links to database in one of the following posts and then run the database installation as described in yet another post. To change the name of the superuser, and the associated project and tract you need to edit the json files used for setting up the Framework database.

period

The period tag is used for defining the temporal setting of the processes. The tag is set up as its own process called periodicity. Some processes are not associated with any temporal resolution, including the setting up of processes itself. The period tag is thus not strictly needed for all processes.

All the period parameters that are expected have a default value. If no parameters are given, the periodicity process expects a single static file with no date. If only startyear is given, startmonth and startday are both set to 1 (i.e. to 1 January). If only endyear is given, the endmonth is set to 12 and endday to 31 (i.e. to 31 December).

Apart from setting the correct start and end dates, the temporal interval must also be set. The coding of the intervals largely follows the Python Data Analysis Library, pandas. pandas should be part of your Anaconda installation. If not, installation instructions are directly on top of pandas home page.

In Karttur´s GeoImagine Framework the attribute timestep is used for defining the temporal interval, and loosely corresponds to the offset aliases defined in pandas. The table below summarises the timestep or offset aliases that are used in the Framework:

Karttur Pandas string code date code Description
D D “yyyymmdd” yyyymmdd daily
M MS “yyyymm” yyyymm01 monthly
Q Q “yyyyq” yyyymm01 quarterly
A AS “yyyy” yyyy0101 annual
XD “yyyydoy” yyyymmdd day interval

While the pandas offset aliases “M” represents month end frequency, in Karttur’s coding it means monthly aggregated value. In the file naming convention, monthly data is represented as “YYYYMM”. In the database the datetime object for any “M” is represented by the first day of that month. Strictly speaking this corresponds to Pandas offset alias “MS”. Quarterly (“Q”) and Annual (“A”) timesteps are constructed in a similar manner. Karttur does not use Pandas offset alias weekly (“W”). Instead Karttur includes a day-interval timestep that can be set to any interval. “7D” would then correspond to a weekly timestep.

In the GeoImagine Framework, data representing statistical or aggregated conditions over a timespan are given derived temporal codings. These derived codes do not have any datetime code, only a string code.

Karttur string code Description
timespan-D “yyyymmdd-yyyymmdd” aggregate daily data
timespan-M “yyyymm-yyyymm” aggregate monthly data
timespan-Q “yyyyq-yyyyq” aggregate quarterly data
timespan-A “yyyy-yyyy” aggregate annual data
timespan-XD “yyyydoy-yyyydoy” aggregate day interval data
seasonal-D “yyyy-yyyy@Ddoy” daily seasonal average
seasonal-M “yyyy-yyyy@Mmm” monthly seasonal average
seasonal-Q “yyyy-yyyy@MQq” quarterly seasonal average
seasonal-XD “yyyy-yyyy@XDdoy” day interval seasonal average

See the periodicity process page for more details.

json process definition

Once the userproject and period are defined, any number of processes relating to the geographical region (that can also be None) and temporal period (that can also be None) can be joined together under the process list clause “[]”). Each process typically requires a specific set of parameters.

Processes can be both spatial and non-spatial (e.g. including the definition of the processes themselves), or for instance start with text data as input and generate a spatial layer as output (or vice-versa).

Common parameters

All processes have some parameters in common:

  • processid [text]
  • verbose [integer]
  • overwrite [boolean]
  • delete [boolean]
  • acceptmissing [boolean]
  • dryrun [boolean]
  "process": [
    {
      "processid": "processid",
      "verbose": 2,
      "overwrite": false,
      "delete": false,
      "acceptmissing": true,
      "dryrun":false
    }
  ]

Specific parameters

You need to check each process specifically to get information on the parameters to set. Some parameters are defaulted and need not be stated, whereas other parameters are mandatory. If you forget to set mandatory parameters, the process will halt and report the missing parameter.

srcpath and dstpath

If the process requires source (src) data stored in files as input, the tag srcpath is required for identifying the volume and the filetype. Similarly, if the process produces data stored in files the tag dstpath is required. For most processes the filetype attribute is not required. By default all spatial raster data is represented as GeoTiff files (.tif), and all spatial vector data as ESRI shape files (.shp).

<srccomp> and <dstcomp>

<srccomp> must be given if the processes expects source data on file. The source data must state the composition parts as attributes:

  • source
  • product
  • folder
  • band
  • prefix
  • suffix

Not all processes that produce destination data on file require that the <dstcomp> is given. Some processes include hardcoded definitions of both the destination composition, the file name and the hierarchical folder structure. Different time series analysis typically produces destination compositions with the folder hierarchy and the file name derived from the source composition, with no possibility for user changes. Other processes does allow some restricted user changes, whereas the majority of processes require that the user give the full definitions of the destination composition(s). Apart from the components listed above, destination compositions must also include attributes for:

  • scalefac
  • offsetadd
  • celltype
  • cellnull
  • dataunit
  • measure

If the Framework detects any conflicts with existing compositions, the process will not proceed until this is corrected.

Other tags

A few other child tags under <process> are used in specific processes, including:

  • <node> (in e.g. addsubproc for defining process parameters)
  • <stats> (in e.g. trendtcancillary for defining statistical measure to produce)
  • <comps> (in e.g. createscaling for defining scaling when exporting data)

The pages for each individual process lists all the tags and attributes required.