Creating a Geospatial database on Amazon RDS

Last year, Amazon added Postgres support to their cloud relational database offering, RDS. The good folks at Amazon were kind enough to include support for some popular extensions, in particular PostGIS, which adds geospatial abilities to Postgres, so queries like “find me 100 users nearest to London” are simple and efficient to perform.

In this post I’ll go through setting up an RDS instance with Postgres + PostGIS, and importing some sample data to show that it all works. Here’s RDS serving up the country boundary and populated places for the UK:

UK towns

Note that everything I’ll do here is covered in Amazon’s Free Tier, so you can try this out at no cost.

AWS basics

For this post, I’m going to assume some familiarity with AWS, in particular that you already have an EC2 instance running, that you can use to connect to the RDS instance. If you do not have an EC2 instance running yet, set one up, choosing a region local to you.

Creating an RDS instance

Creating an RDS instance is pretty straightforward:

  • Go to the RDS console select ‘Launch a DB Instance and select Postgres
    Select Postgres
  • You’ll be asked whether to go with a Free or Production system, I went with Free
  • Configure your instance as follows (this will keep you within the Free Tier)
    Select Postgres
  • In the additional config, choose a name for your database, and then place it in the same VPC and availability zone as the EC2 you created earlier. I opted to make my instance not Publicly Accessible. By placing the RDS instance in the same VPC as your EC2 instance you will be able to access it from there.
  • Finally configure your backups (I kept the defaults) and launch the instance.

Your instance will now launch.

Connecting to your RDS instance

Go to the RDS console to view your instance (if you can’t see it double check you are in the right AWS region). If you select your instance, you’ll be told the Endpoint for the database, which will look something like mydb.0123456789abcd.eu-west-1.rds.amazonaws.com:5432, note this down.

Login to your EC2 box and verify that you can talk to your RDS instance, by invoking:

telnet mydb.0123456789abcd.eu-west-1.rds.amazonaws.com 5432

If all is well you should see something like the following

Trying 172.0.0.1...
Connected to mydb.0123456789abcd.eu-west-1.rds.amazonaws.com.
Escape character is '^]'.

If you can’t connect, double check that the Security Group for the RDS instance allows connections from your EC2 instance on port 5432.

Using your EC2 box as a proxy

Great, so now you can talk to the RDS instance from your EC2 box, but not from anywhere else, in particular your local machine. To enable access from you local machine, you can set up an SSH tunnel. Invoke the following in a terminal on the local machine:

ssh my.ec2.instance.amazonaws.com -L 5432:mydb.0123456789abcd.eu-west-1.rds.amazonaws.com:5432

Now you can just use localhost:5432 on your local machine to connect directly to the Postgres RDS instance. Whether you use this tunnel or the EC2 machine for the rest of the setup is up to you.

Installing PostGIS

To actually connect to Postgres you’ll need to install psql, on Ubuntu this is simply sudo apt-get install postgresql-client-9.1.

Then connect using the following command:

psql --host mydb.0123456789abcd.eu-west-1.rds.amazonaws.com --port 5432 --username user --dbname mydb

Or if using the SSH tunnel:

psql --host localhost --port 5432 --username user --dbname mydb

When prompted, enter your password, and you should be in. Installing PostGIS is a breeze, just type this into the psql prompt:

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_topology;

Adding data to the database

For our source of data we’ll use the boundaries of world countries from Natural Earth. However you can use pretty much any shape file, so you can pick another one of the datasets from Natural Earth, like rivers or place names.

To get the data into Postgres, we’ll use the shp2pgsql tool. If you already have PostGIS installed on your EC2 box or local machine you’ll have this already, otherwsie you’ll need to install it, on Ubuntu use sudo apt-get install postgis.

Then to download the data, convert it and populate the database use:

curl -O http://www.nacis.org/naturalearth/10m/cultural/ne_10m_admin_0_countries.zip
unzip ne_10m_admin_0_countries.zip
shp2pgsql -s 900913 ne_10m_admin_0_countries.shp countries mydb > countries.sql
psql --host mydb.0123456789abcd.eu-west-1.rds.amazonaws.com --port 5432 --username user --dbname mydb --file countries.sql 

Be sure you to use the correct database name in the shp2pgsql command, rather than mydb

To verify that the import worked, enter this into the psql prompt

SELECT ST_AsGeoJson(the_geom) from countries LIMIT 1;

You should get back JSON describing the shape of a country:

{"type":"MultiPolygon","coordinates":[[[[-69.9969376289999,12.577582098],[-69.9363907539999,12.5317243510001], ...

Visualizing data

A neat way to visualize data is using a program like Q-GIS. With this installed you can easily connect to the RDS database directly from the program and visually see what is there.

When the dataset that we use above is imported, it looks like this:

World

The performance is quite slow, compared to using a local database, so this is more for sanity checks then anything else.

Overall, I found the whole setup pretty painless, definitely simpler than setting up a local Postgres database on my Mac. So far, I haven’t taxed the system much, so I can’t talk much about performance. If anyone is running Postgres on RDS in production and can talk to this, I’d love to hear from you.