20. Trip Patterns**

In a GTFS feed, a route typically has multiple trips that start and finish at the same stops. If you are looking to reduce the size of the data stored, then converting data from stop_times.txt into a series of reusable patterns is an excellent way to do so.

For two trips to share a common pattern, the following must hold true:

  • The stops visited and the order in which they are visited must be the same
  • The time differences between each stop must be the same.

The following table shows some fictional trips to demonstrate this.

StopTrip 1Trip 2Trip 3
S110:00:0010:10:0010:20:00
S210:02:0010:13:0010:22:00
S310:05:0010:15:0010:25:00
S410:06:0010:18:0010:26:00
S510:10:0010:21:0010:30:00

In a GTFS feed, this would correspond to 15 records in stop_times.txt. If you look more closely though, you can see the trips are very similar. The following table shows the differences between each stop time, instead of the actual time.

StopTrip 1Trip 2Trip 3
S100:00:0000:00:0000:00:00
S200:02:00 (+2m)00:03:00 (+3m)00:02:00 (+2m)
S300:05:00 (+5m)00:05:00 (+5m)00:05:00 (+5m)
S400:06:00 (+6m)00:08:00 (+8m)00:06:00 (+6m)
S500:10:00 (+10m)00:11:00 (+11m)00:10:00 (+10m)

You can see from this table that the first and third trip, although they start at different times, have the same offsets between stops (as well as stopping at identical stops).

Instead of using a table to store stop times, you can store patterns. By storing the ID of the pattern with each trip, you can reduce the list of stop times in this example from 15 to 10. As only time offsets are stored for each patterns, the trip starting time also needs to be saved with each trip.

You could use SQL such as the following to model this.

CREATE TABLE trips (
  trip_id TEXT,
  pattern_id INTEGER,
  start_time TEXT,
  start_time_secs INTEGER
);

CREATE TABLE patterns (
  pattern_id INTEGER,
  stop_id TEXT,
  time_offset INTEGER,
  stop_sequence INTEGER
);

The data you would store for trips in this example is shown in the following table.

trip_idpattern_idstart_timestart_time_secs
T1110:00:0036000
T2210:10:0036600
T3110:20:0037200

Note: The above table includes start_time_secs, which is an integer value representing the number of seconds since the day started. Using the hour, minutes and seconds in start_time, this value is H * 3600 + M * 60 + S.

In the patterns table, you would store data as in the following table.

pattern_idstop_idtime_offsetstop_sequence
1S101
1S21202
1S33003
1S43604
1S56005
2S101
2S21802
2S43003
2S54804
2S66605

As you can see, this represents an easy way to significantly reduce the amount of data stored. You could have tens or hundreds of trips each sharing the same pattern. When you scale this to the entire feed, this could reduce, say, 3 million records to about 200,000.

Note: This is a somewhat simplified example, as there is other data available in stop_times.txt (such as separate arrival/departure times, drop-off type and pick-up type). You should take all of this data into account when determining how to allocate patterns.

Updating Trip Searches

Changing your model to reuse patterns instead of storing every stop time means your data lookup routines must also be changed.

For example, to find all stop times for a given trip, you must now find the pattern using the following SQL query.

SELECT * FROM patterns
  WHERE pattern_id = (SELECT pattern_id FROM trips WHERE trip_id = 'YOUR_TRIP_ID')
  ORDER BY stop_sequence;

If you want to determine the arrival/departure time, you must add the offset stored for the pattern record to the starting time stored with the trip. This involves joining the tables and adding time_offset to start_time_secs, as shown in the following query.

SELECT t.start_time_secs + p.time_offset, p.stop_id
  FROM patterns p, trips t
  WHERE p.pattern_id = t.pattern_id
  AND t.trip_id = 'YOUR_TRIP_ID'
  ORDER BY p.stop_sequence;

Other Data Reduction Methods

There are other ways you can reduce the amount of data, such as only using patterns to store the stops (and not timing offsets), and then storing the timings with each trip record. A technique such as this further reduces the size of the database, but the trade-off is that querying the data becomes slightly more complex.

Hopefully you can see that by using the method described in this chapter there are a number of ways to be creative with GTFS data, and that you must make decisions when it comes to speed, size, and ease of querying data.