Change in rainfall 2001-2016, global tropics

Map: Change in rainfall 2001-2016, global tropics

Connect Python and PostgreSQL using psycopg2

Introduction

This post on how to Install psycopg2, create connection to Postgres and createdb is not needed if you clone or download Karttur´s GeoImagine Framework from the GitHub.com. This post, however, contains the details on how to create a more secure database connection. And it also covers both general and detailed instructions that are useful for other setups with psycopg2 as well as when updates are required.

In earlier posts I described how to install Eclipse IDE for Python development after installing Anaconda Python as the Python interpreter, and then I installed PostgreSQL and PostGIS. This post describes how to connect Python with the Postgres server and create a new database in Postgres using Python.

Environments and packages

The Anaconda Python distribution contains a lot of Python packages (a package is a self contained collection of python modules [.py files] that performs given tasks). When working with Python you can find packages for almost all your needs, either that can be used out of the box, or after some modification. There is of course a package for connecting Python to Postgres server - psycopg2.

Psycopg2 as a package in a virtual environments

If you followed the post on Conda virtual environments you should have a virtual environment for Python (e.g. geoimagine001) setup on your local machine. And it should include psycopg2 as this package was added to the list of default packages to install with any new virtual environment.

Add psycopg2 to a virtual environment

if you did setup a virtual python environment as described in the previous section, you can just skip this section.

You can install new packages into your environment in the usual way that conda packages are installed. Make sure that the environment into which you want to install a package (psycopg2 in this case) is the active environment:

$ conda activate geoimagine001

The terminal prompt should now point at your environment. and you can enter the install command:

(geoimagine001) … $ conda install -c anaconda psycopg2

or tell conda under which environment to install the packages:

(base) …$ conda install –name geoimagine001 -c anaconda psycopg2

Once the installation is finished you should see the installed packages under the site-packages path. On my machine that is:

/Applications/anaconda3/envs/geoimagine001/lib/python3.6/site-packages/psycopg2

Add psycopg2 to the Anaconda base

This section describes how to add psycopg2 to your conda base environment. If you have a virtual environment as describes above, you can also skip this section.

If you installed Anaconda and set up Eclipse as described in the earlier posts, the Python distribution that Eclipse uses is under:

‘path’/anaconda3/lib/python3.7

where ‘path’ is the path you choose for installing Anaconda (if you use another python version then 3.7, the version number is different). Your Python path contains a folder called site-packages. The packages in that folder are available for direct use in the Eclipse IDE.

In my Anaconda installation, psycopg2 was not installed under site-packages, but included as an .egg file - a kind of package installer. To install psycopg2, with Anaconda set up as described in the earlier post, all you have to do to add psycopg2 to your site-packages is to execute the Terminal command:

$ conda install psycopg2

To check if psycopg2 is in place, list the package content in the Terminal:

$ ls ‘path’/anaconda2/lib/python2.7/site-packages/psycopg2

You can also open the folder in Finder. Copy the full path to the site-packages folder, and when in Finder, simultaneously press the keys for ‘command(cmd)’+’Shift’+’G’, in the Go to the Folder that opens, just paste the full path and click Go.

Security

If you are only going to use your Postgres database as localhost (on your own machine), security is less important. But if you want to protect your data you must set some level of security. The solution I use is primarily for macOS and UNIX/Linux systems, and is not very advanced. I use a combination of storing my password in my home directory (~) combined with a simple encryption.

Create a file in your home directory (~) called .netrc that defines your credentials. An earlier post describes how to use the Terminal for creating and editing files in detail. In the Terminal go to your home directory:

$ cd ~

Then start the Terminal text editor pico for editing/creating the file:

$ pico .netrc

Enter the two lines below (but with your role/user and password), one for the default user (if you installed Postgres with Homebrew the default user is the same as your user on the local machine - ‘yourUser’), and one for the production user (‘prodUser’) if you followed my suggestions in the previous post. If you only have the default user, enter the same login and password in both lines.

machine localhost0   login yourUser   password yourPassword
machine localhost1   login prodUser   password prodPassword

Exit pico (ctrl-X) and save the file (click Y when asked to save). You probably have to change the read and write permissions for .netrc, which you do by executing the following Terminal command:

$ chmod og-rw .netrc

With this solution your credentials will only be explicitly written out in a hidden file.

Set Postgres connection in Python (Eclipse)

Start Eclipse, and you should come back to the workbench that you created in a previous post. Repeat the steps outlined in that post to create a new PyDev project, with a PyDev package and a sub-package.

Create a project (call it what you want):

File : New : Project : PyDev project

Create a new PyDev Package and call it ‘geoimagine’:

File : New : PyDev Package

Create a new sub-package and call it ‘db_setup’:

File : New : PyDev Package

In the dialog window you should see the main package (‘geoimagine’) already filled (geoimagine). Use Python syntax and add a dot (.) followed by the name of the sub-package (‘db_setup’) (geoimagine.db_setup).

PyDev Class module

In the sub-package create a Python module called ‘db_setup_class’ (set type to Class in the Template) window that appears after you click click Finish:

File : New : PyDev Module

In the db_setup_class.py module enter (or copy and paste) the following code (replace the default Class that was created with the module):

import psycopg2
from base64 import b64decode

class PGsession:
    """Connect to postgres server"""   
    def __init__(self, query):
        """Connect to selected database"""
        query['pswd'] = b64decode(query['pswd'])
        conn_string = "host='localhost' dbname='%(db)s' user='%(user)s' password='%(pswd)s'" %query
        self.conn = psycopg2.connect(conn_string)
        self.cursor = self.conn.cursor()

The import psycopg2 you installed above, whereas the package base64 is a Python core package. You will use it to send your password in encrypted form. When you call the module class PGsession it expects a variabe called ‘query’. This variable should be in the form of a dictionary, with pairs of keys and values:

query = {key0:value0, key1:value1 ...}

You are going to use the query dictionary for sending the user and (encrypted) password to PGsession. But you must first create the Main PyDev module from which you will do that.

Connect and create a new DB

Create a second PyDev module, called db_setup_main in the db_setup package. Below are two versions, but the second version is just in case your system can not handle the default .netrc file using the netrc package (the second version instead includes an explicit file reader, but you should not need to worry about that).

from db_setup_class import PGsession
from base64 import b64encode
import netrc

def SetUpProdDb(prodDB):
    '''
    Create production database(db)
    '''
    HOST = 'localhost1'
    secrets = netrc.netrc()
    username, account, password = secrets.authenticators( HOST )
    #encrypt the password
    pswd = b64encode(password)
    #create a query dictionary for connecting to the Postgres server
    query = {'db':'postgres','user':username,'pswd':pswd}
    #Connect to the Postgres Server
    iniSession = PGsession(query)
    #Set the name of your production database(db)
    prodDbD = {'dbname':prodDB}
    #Select the current (cluster) db
    iniSession.cursor.execute("SELECT current_database()")
    #Get the results from the SELECT statement
    record = iniSession.cursor.fetchone()
    #Print Current (cluster) db
    print 'Current database',record[0]
    #Select the logged in user
    iniSession.cursor.execute("SELECT user")
    #Get the results from the SELECT statement
    record = iniSession.cursor.fetchone()
    #Print Current user
    print 'User',record[0]
    #Select all databases in the cluster db
    iniSession.cursor.execute("SELECT datname FROM pg_database;")
    #Get the results from the SELECT statement
    records = iniSession.cursor.fetchall()
    #Convert the retrieved records to a simple list
    dbL = [item[0] for item in records]
    #Print the list of all databases in the cluster
    print 'Databases', dbL
    #Check if your required production db exists
    if not prodDbD['dbname'] in dbL:
        #Your production db does not exists
        printstr = 'Creating database: %s' %( prodDbD['dbname'])
        print printstr
        #Import the psycopg extension that allows you to create a new db
        from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
        #Invoke the connection with the extension
        iniSession.conn.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
        #Create your production db
        iniSession.cursor.execute("CREATE DATABASE %(dbname)s;" %prodDbD)

if __name__ == "__main__":
    prodDB = 'YourProdDB'
    SetUpProdDb(prodDB)

The alternative version (with an explicit reader for the .netrc) is only a backup if the above version does not work properly. Click the button to see the code.

Run your package

With the main module (db_setup_main.py) open in the Eclipse main pane, kick off your code from the Eclipse menu:

Run : Run

If everything worked out, the Console pane should return:

Current database postgres
User YourUser
Databases ['postgres', 'template1', 'template0']
Creating database: 'prodDB'

If you run it again, the database ‘prodDB’ will be listed in ‘Databases []’ in the Console, and not be created again.

You can look at the details for ‘prodDB’ in pgAdmin (installing and setting up pgAdmin is described in this post). Connect pgAdmin to your Postgres server and expand the Browser menu by clicking on the small ‘+’ signs:

Servers : postgres : Databases : ‘prodDB’

If you click ‘prodDB’ (you might also need to click the tab SQL in the right pane of pgAdmin) you will see how ‘prodDB’ was defined:

CREATE DATABASE 'prodDB'
    WITH
    OWNER = 'youruser'
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.UTF-8'
    LC_CTYPE = 'en_US.UTF-8'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1;

Encoding, collation, and Ctype are set to UTF8 (the most universal set of characters), and tablespace is the to pg_default. The ‘-1’ for connection limit means unlimited. You can set all this parameters in the SQL for CREATEDB if you want, but I prefer the most universal, which is also the postgres default.

Resources

PostgreSQL

psycopg2