Average rainfall 2001-2016, global tropics

Map: Average rainfall 2001-2016, global tropics

Karttur's GeoImagine Framework:
Setup processes (setup_processes)
Part 3 Database backup and restore

Thomas Gumbricht bio photo By Thomas Gumbricht

This post outlines different alternatives for creating Karttur’s GeoImagine Framework database in postgres, and filling it with default data, using postgres itself. The postgres commands outlined in this post, together with the prepared files outlined in the last section (below), are thus an alternative to the setup_db commands to create the postgres database. If you want to create a customized postgres database, then you need to revert to the original definitions in setup_db.

Introduction

The backbone keeping all the data and processes together in Karttur’s GeoImagine Framework is a PostgreSQL database. If you build the Framework from scratch, the database is first created in the python package setup_db and then subsequently filled with data as you install processes and different data systems. Most processes that you run also add records related to the data layers that are created.

Some of the setup processes used for analysing data and then inserting the results into the database take a long time - including for instance the analysis of spatial relations between different geographic regions and the tiling system of the different projection systems. Instead of running these processes you can insert the database records using prepared files that can be used by PostgreSQL itself for creating and updating the Framework database.

Prerequisites

You must have installed Karttur’s GeoImagine Framework. If you need to Install PostgreSQL (or postgres for short) this is outlined in the post Install postgreSQL and postGIS.

PostgreSQL database

The most advanced alternative for backup and restoring both the definitions (sometimes called schemas, but schema also denotes a subset of tables within a database) and data of a postgres database, is to use the command pairs of pg_dump and pg_restore or psql. The second alternative is to write the results of an ordinary SELECT query to a comma-separated values (csv) file. The content of the csv file can then be imported to the db either by an ordinary sql command INSERT adding line by line, or COPY to read and add the whole file in one go. The latter alternative can not be used for defining the database, only for adding data to predefined schemas and tables.

pg_dump

pg_dump is a versatile utility for PostgreSQL database backup. Digestible instructions are available at for example stackoverflow and SimpleBackups. This post on stackoverflow outlines the differences between restoring with pg_restore and psql, and how to parameterise pg_dump for either alternative.

In Karttur’s GeoImagine Framework, there are three different processes that binds to pg_dump:

pg_dump has a range of options; the Framework processes binding to pg_dump, however, only allow for setting the options for Format (-F or --format), Data only (-a or --data-only) and Schema only (-s or --schema-only_). Data only and Schema only (or definition only) are each others inverse, and can thus not be used together. In the Framework Schema only is invoked by setting the parameter definitiononly to true, Data only by setting the parameter dataonly to true. If both are set to false, the complete definition (schema) and data are backed up. The default format is c, that generates custom dump files that can be used for restoring the database by calling pg_restore, if you prefer to use psql for restoring, then you should use the Format p (plain) setting.

To run directly from the python package, the operating system link to pg_dump (and pg_restore/psql) must be set in order for the system command to be executed in runtime. If you have trouble in linking the path to pg_dump in your system setup, you can use the parameter cmdpath to set an explicit path to pg_dump (and pg_restore). For my postgres installation using Homebrew I need to set "cmdpath":"/usr/local/bin".

pg_restore

There are three corresponding Framework processes binding to pg_restore:

The three restoration processes actually calls the same internal process under the hood of the Framework, the only difference is that the three different processes look for different backup (dump) files. The default Format for all three processes is set to c, and thus the restoration process is defaulted to pg_restore.

Export table data as csv

In the Framework, all postgres database tables can be exported using a standard SELECT query and then writing the results to a csv file. There are three different Framework processes for exporting table data as csv files:

In comparison with the pg_dump processes, the export functions can only save data; and data is always saved for each individual table. Thus the processes ExportSchemaCsv and ExportDatabaseCsv loop over and process individual tables. These two processes can only save the complete data from a table. With the process ExportTableCsv the columns and records to save can be set by listing the columns and apply a WHERE statement.

Import table data from csv files

Corresponding to the processes for exporting table data as csv, the Framework has different processes for importing csv data to the database. There are two sets of processes, using ordinary database INSERT and using COPY that reads the complete csv in one go. INSERT is better to use when filling up a table that already contains some data:

The postgres COPY command is faster, but the table can not contain any of the records listed in the csv, COPY is thus best suited for filling a new table or replacing (by setting the parameter overwrite to true) all records in an existing table:

Setup_processes linked commands

The package setup_processes is prepared for both dumping (and exporting) and restoring (importing) the complete Framework database. In the __main__ section of setup_process_main you can either backup or restore the complete Framework database by removing the comment sign. The default path for both commands is directly under the package setup_processes itself. When you cloned (copied) the database from Karttur’s GitHub repo, the default database backup is included in several formats (oulined in detail below).

If you want to make a fresh backup the database, remove the comment sign (“#”) for #BackupDatabase(prodDB). To restore the complete default database, uncomment RestorepDatabase(prodDB).

    # BackupDatabase creates a backup of the entire db in different formats
    #BackupDatabase(prodDB)

    # RestoreDatabase restores of the entire using psql
    #RestorepDatabase(prodDB)

BackupDatabase

The function BackupDatabase links to backup_karttur_db_YYYYMMDD.txt. Inspect its content by toggling the Hide/Show button and click on the linked json file name.

Running the json command file (db-dump-all_v090.json) on 4th December 2021 generated the backup structure under the Hide/Show button.

Restoring from prepared dump and csv files

The Framework GitHub repo package setup_processes contains a sub-folder called dbdump, under which you find all the dump (sql) and text (csv) files required to create the Framework database (listed above under the Hide/Show button). The Framework database can have any name, but the default name is geoimagine, and that is also the name of the single sub-folder under dbdump. The complete database, all schemas and tables with data, are hierarchically organised under geoimagine. Thus the .sql files that can be called by either pg_restore or psql to restore the complete database are directly under the geoimagine folder:

  • c-geoimagine_yyyy-mm-dd.sql
  • c-geoimagine-data_yyyy-mm-dd.sql
  • c-geoimagine-definition_yyyy-mm-dd.sql
  • p-geoimagine_yyyy-mm-dd.sql
  • p-geoimagine-data_yyyy-mm-dd.sql
  • p-geoimagine-definition_yyyy-mm-dd.sql

The first letter of the files indicates whether the sql is formatted as [c]ustom (for pg-restore) or [p]lain (for psql). Both alternatives were tested in November 2021, for both MacOS and Windows. With varying results.

MacOS

While psql could be run parameterised with any of the three [p] sql files, pg-restore did not function with the combined sql (c-geoimagine_yyyy-mm-dd.sql), but the database could be restored by sequentially running c-geoimagine-definition_yyyy-mm-dd.sql followed by c-geoimagine-data_yyyy-mm-dd.sql. The complete sequence of creating (or restoring) the Framework database using postgres itself thus becomes:

  • 1a. Create a new postgres database (“geoimg02”):
  • $ psql geoimg02
  • 1b. If you need to setup a user (role) for the database (“karttur” in this example):
  • $ CREATE USER karttur WITH LOGIN PASSWORD 'quoted password' SUPERUSER CREATEDB CREATEROLE;
  • 1c. quite the psql session before running the restoration commands:
  • $ \q
  • 2a Restore the Framework db to “geoimg02” using psql:
  • $ psql -U karttur -d geoimg03 < /path/to/dbdump/geoimagine/p-geoimagine_yyyy-mm-dd.sql
  • or
  • 2b Run the definition and data adding in two separate steps with pg-restore:
  • $ psql -U karttur -d geoimg02 < /path/to/dbdump/geoimagine/c-geoimagine-definition_yyyy-mm-dd.sql
  • $ psql -U karttur -d geoimg02 < /path/to/dbdump/geoimagine/c-geoimagine-data_yyyy-mm-dd.sql

Windows

For Windows we did not manage to restore the database using the terminal commands. But restoring using psql via the free version of tableplus and p-geoimagine_yyyy-mm-dd.sql worked flawlessly.

RestoreDatabase using setup_processes

You can also use the function RestoreDatabase from the __main__ section of setup_process_main. However, before you remove the comment sign, make sure that you define the kid of restore command you want to run. In order not to overwrite any existing records you need to carefully read the options for using psql, pg-restore INSERT and COPY in the manual pages of your version of PostgreSQL.

You can bypass any conflicts by setting up a completely new database, in which case you will not overwrite any existing data. And then just run the framework with the novel database. The default command files lined in the setup_processes has both dataonly and definitiononly set to true, which is not allowed and causes an error report. That means that to actually use the RestoreDatabase function you must edit the command file before it can be used.

Link to restore_karttur_db_YYYYMMDD.txt. Inspect its content by toggling the Hide/Show button and click on the linked json file name.

Next step

The next step is adding default regions.