"Moving objects" or "spatio-temporal database" is a database, which stores data with temporal and spatial aspects. One example is keeping a database of how real estate changes through time, another a bit more interactive is, say, tracking location and movement of vehicles through time and keeping the history in database.
What we want to be able to answer is such questions as:
- where have the tracked objects been over time,
- check whether an tracked object has passed through a checkpoint,
- check if two tracked objects have crossed paths,
- if two tracked objects have crossed paths, where and at what time has this happened?
To achieve this, we will combine two PostgreSQL's features: builtin range data types and operators, and spatial data types and operators provided by PostGIS extension.
For the example here, we will use GPX track exports of some of my runs. First we need to import GPX data into a table we create:
CREATE TABLE gpx ( id integer, filename text, lat numeric, lon numeric, ts timestamptz);
I created a small Python script converter.py, which reads GPX XMLs in current dir and outputs CSV with coordinates out stdout. This is then directly usable in psql's \COPY command:
\COPY gpx FROM PROGRAM 'python converter.py' WITH CSV; COPY 17549 Time: 617,839 ms
Modeling spatial aspects
Postgres supports spatial data types via PostGIS extension. I will not go into full detail about PostGIS here, for now it suffices to know it works by adding a geometry column to our table and that it allows us to index this column, which enables fast lookups. In my example we will model the distance the object covered with line geometry. Besides linestrings (lines), there are also points and polygons.
Modeling temporal aspects
Postgres supports ranged types, and among them are also time ranges. daterange is a range between two dates and tstzrange is a range between two timestamptz values. Ranges in Postgres are canonically represented in included-excluded notation, meaning the start of the range is included and the end of the range is excluded but marks the boundary and all values less than that are included. Example:
SELECT '[2016-01-01, 2016-06-30]'::daterange; daterange --------------------------- [2016-01-01,2016-07-01) (1 row)
With this, we created a date range for the first half of this year, that is between first of January inclusive and last of June inclusive. We can see Postgres normalizes this into 1st January inclusive and 1st July exclusive.
A table for tracking objects
For tracking objects, lets create a new table. Table needs to have a few fields: a serial for identifying each segment, an object identifier to know which object the time and location belongs to, and, of course, time period and corresponding location of the object:
CREATE TABLE tracks ( id serial, object_id integer, timeslice tstzrange);
You will notice the location is missing! The geometry columns are added via a special function AddGeometryColumn. We need to provide a number of parameters. Table name and column name are obvious. Next we need to provide spatial reference ID, which uniquely identifies a number of parameters required for correctly interpreting spatial data - here we use ID 4326, which denotes the WGS84 world geodetic system. Geometry type we'll be using is LINESTRING and the last parameter is number of dimensions, which is two:
SELECT AddGeometryColumn('tracks', 'geom', 4326, 'LINESTRING', 2);
Now we want to convert GPX data into tracks. Tracks will have temporal intervals in field timeslice and lines in field geom.
With following query we can convert numerical values in table gpx to real point data type. We use ST_MakePoint to create a point and ST_SetSRID, to set spatial reference system on geometry so PostGIS knows how to interpret it:
--- example SELECT id, ts, ST_SetSRID(ST_MakePoint(lon, lat), 4326) pt FROM gpx;
Having points, we can use "lead" window function to pair current timestamp and point with timestamp and point in the next row:
--- example SELECT id, ts, lead(ts) OVER (PARTITION BY id ORDER BY ts ASC) next_ts, pt, lead(pt) OVER (PARTITION BY id ORDER BY ts ASC) next_pt FROM points;
Having such two timestamps and points in the same row, all we have to do now is join them into ranged types. Here we must be careful to exclude the last row, since the next_pt and next_ts are both NULL and will not generate correct geometry and time range:
--- example SELECT id AS object_id, ST_MakeLine(pt, next_pt) geom, tstzrange(ts, next_ts) timeslice FROM twopoints WHERE next_pt IS NOT NULL ORDER BY id, ts ASC;
Now we can create a pipeline and join above three queries into one using WITH statement and insert resulting data into tracks table:
WITH points AS ( SELECT id, ts, ST_SetSRID(ST_MakePoint(lon, lat), 4326) pt FROM gpx), twopoints AS ( SELECT id, ts, lead(ts) OVER (PARTITION BY id ORDER BY ts ASC) next_ts, pt, lead(pt) OVER (PARTITION BY id ORDER BY ts ASC) next_pt FROM points), segments_and_timeslices AS ( SELECT id AS object_id, ST_MakeLine(pt, next_pt) geom, tstzrange(ts, next_ts) timeslice FROM twopoints WHERE next_pt IS NOT NULL ORDER BY id, ts ASC) INSERT INTO tracks (object_id, timeslice, geom) SELECT object_id, timeslice, geom FROM segments_and_timeslices; INSERT 0 17543 Time: 413,548 ms
There. In table tracks, we now have all we need to get meaningful results to the questions above. To make it speedy, we need to add a few more indexes:
modb=# CREATE INDEX tracks_timeslice_idx ON tracks USING GIST (timeslice); CREATE INDEX Time: 250,054 ms modb=# CREATE INDEX tracks_geom_idx ON tracks USING GIST (geom); CREATE INDEX Time: 96,496 ms
Answering the questions
Now we can check where an object has been on 7th May this year between 13h and 14h. We do this by using && overlap operator with tstzrange:
SELECT A.id, A.object_id, A.geom FROM tracks A WHERE A.timeslice && '[2016-05-07 13:00:00+02,2016-05-07 14:00:00+02)'::tstzrange AND A.object_id = 0; id | object_id | geom ---------+-----------+-------------------------------------------------------------------------------------------- 117576 | 0 | 0102000020E610000002000000D0B4C4CA681C2D40B2D6506A2F044740AE4676A5651C2D40E84B6F7F2E044740 117577 | 0 | 0102000020E610000002000000AE4676A5651C2D40E84B6F7F2E0447408CD82780621C2D408F34B8AD2D044740 ...
Check if any two objects have crossed paths. In geometry, there's also && overlap operator, but it checks if the bounding boxes of compared geometries overlap, not the actual geometries. This is why we have to use another function here, ST_Intersects, which is exact:
SELECT A.object_id, B.object_id, A.id, B.id FROM tracks A, tracks B WHERE ST_Intersects(A.geom, B.geom) AND A.timeslice && B.timeslice AND A.id != B.id AND A.object_id < B.object_id; object_id | object_id | id | id ------------+-----------+----+---- (0 rows) Time: 1032,741 ms
Obviously, since all the tracks are from me, I can't cross paths with myself. But we can use postgres to time travel one of my runs forward in time exactly the right amount to make it cross another:
UPDATE tracks SET timeslice = tstzrange( lower(timeslice) + interval '302 days 20:33:37', upper(timeslice) + interval '302 days 20:33:37', '[)') WHERE object_id = 5; UPDATE 4632 Time: 164,788 ms
And now, if we repeat above query:
SELECT A.object_id, B.object_id, A.id, B.id FROM tracks A, tracks B WHERE ST_Intersects(A.geom, B.geom) AND A.timeslice && B.timeslice AND A.id != B.id AND A.object_id < B.object_id; object_id | object_id | id | id ------------+-----------+--------+-------- 4 | 5 | 129396 | 133243 (1 row) Time: 1099,215 ms
In this query it is enough to go through half of returned rows, since we are comparing all the tracks with all of the other and including only one permutation is enough. That's why we limit the query with A.object_id < B.object_id.
Checking if an object has passed a checkpoint then means specifying a line denoting checkpoint and looking up intersections with it in tracks table - an exercise left for the reader.
Modeling moving objects with ranges and lines is just one way. Your database representation might vary a lot depending on questions the users will be asking about the data.
Hopefully this tutorial has made a case how ranged data types in postgres can make otherwise very intensive task - finding intersections in both spatial and temporal dimensions - relatively simple.