Last year, Amazon added Postgres support to their cloud relational database offering, RDS. The good folks at Amazon were kind enough to include support for some popular extensions, in particular PostGIS, which adds geospatial abilities to Postgres, so queries like “find me 100 users nearest to London” are simple and efficient to perform.
In this post I’ll go through setting up an RDS instance with Postgres + PostGIS, and importing some sample data to show that it all works. Here’s RDS serving up the country boundary and populated places for the UK:
Note that everything I’ll do here is covered in Amazon’s Free Tier, so you can try this out at no cost.
For this post, I’m going to assume some familiarity with AWS, in particular that you already have an EC2 instance running, that you can use to connect to the RDS instance. If you do not have an EC2 instance running yet, set one up, choosing a region local to you.
Creating an RDS instance is pretty straightforward:
Your instance will now launch.
Go to the RDS console to view your instance (if you can’t see it double check you are in the right AWS region). If you select your instance, you’ll be told the Endpoint
for the database, which will look something like mydb.0123456789abcd.eu-west-1.rds.amazonaws.com:5432
, note this down.
Login to your EC2 box and verify that you can talk to your RDS instance, by invoking:
telnet mydb.0123456789abcd.eu-west-1.rds.amazonaws.com 5432
If all is well you should see something like the following
Trying 172.0.0.1...
Connected to mydb.0123456789abcd.eu-west-1.rds.amazonaws.com.
Escape character is '^]'.
If you can’t connect, double check that the Security Group for the RDS instance allows connections from your EC2 instance on port 5432.
Great, so now you can talk to the RDS instance from your EC2 box, but not from anywhere else, in particular your local machine. To enable access from you local machine, you can set up an SSH tunnel. Invoke the following in a terminal on the local machine:
ssh my.ec2.instance.amazonaws.com -L 5432:mydb.0123456789abcd.eu-west-1.rds.amazonaws.com:5432
Now you can just use localhost:5432
on your local machine to connect directly to the Postgres RDS instance. Whether you use this tunnel or the EC2 machine for the rest of the setup is up to you.
To actually connect to Postgres you’ll need to install psql
, on Ubuntu this is simply sudo apt-get install postgresql-client-9.1
.
Then connect using the following command:
psql --host mydb.0123456789abcd.eu-west-1.rds.amazonaws.com --port 5432 --username user --dbname mydb
Or if using the SSH tunnel:
psql --host localhost --port 5432 --username user --dbname mydb
When prompted, enter your password, and you should be in. Installing PostGIS is a breeze, just type this into the psql
prompt:
CREATE EXTENSION postgis;
CREATE EXTENSION postgis_topology;
For our source of data we’ll use the boundaries of world countries from Natural Earth. However you can use pretty much any shape file, so you can pick another one of the datasets from Natural Earth, like rivers or place names.
To get the data into Postgres, we’ll use the shp2pgsql tool. If you already have PostGIS installed on your EC2 box or local machine you’ll have this already, otherwsie you’ll need to install it, on Ubuntu use sudo apt-get install postgis
.
Then to download the data, convert it and populate the database use:
curl -O http://www.nacis.org/naturalearth/10m/cultural/ne_10m_admin_0_countries.zip
unzip ne_10m_admin_0_countries.zip
shp2pgsql -s 900913 ne_10m_admin_0_countries.shp countries mydb > countries.sql
psql --host mydb.0123456789abcd.eu-west-1.rds.amazonaws.com --port 5432 --username user --dbname mydb --file countries.sql
Be sure you to use the correct database name in the shp2pgsql
command, rather than mydb
To verify that the import worked, enter this into the psql
prompt
SELECT ST_AsGeoJson(the_geom) from countries LIMIT 1;
You should get back JSON describing the shape of a country:
{"type":"MultiPolygon","coordinates":[[[[-69.9969376289999,12.577582098],[-69.9363907539999,12.5317243510001], ...
A neat way to visualize data is using a program like Q-GIS. With this installed you can easily connect to the RDS database directly from the program and visually see what is there.
When the dataset that we use above is imported, it looks like this:
The performance is quite slow, compared to using a local database, so this is more for sanity checks then anything else.
Overall, I found the whole setup pretty painless, definitely simpler than setting up a local Postgres database on my Mac. So far, I haven’t taxed the system much, so I can’t talk much about performance. If anyone is running Postgres on RDS in production and can talk to this, I’d love to hear from you.