17. Searching for Trips**
This section shows you how to search for trips in a GTFS feed based on specified times and stops.
The three scenarios covered are:
- Finding all stops departing from a given stop after a certain time
- Finding all stops arriving at a given stop before a certain time
- Finding all trips between two stops after a given time
The first and second scenarios are the simplest, because they only rely on a single end of each trip. The third scenario is more complex because you have to ensure that each trip returned visits both the start and finish stops.
When searching for trips, the first thing you need to know is which services are running for the given search time.
Finding Service IDs
The first step in searching for trips is to determine which services are
running. To begin with, you need to find service IDs for a given date.
You then need to handle exceptions accordingly. That is, you need to add
service IDs and remove service IDs based on the rules in
calendar_dates.txt
.
Note: In Scheduling Past Midnight, you were shown how GTFS works with times past midnight. The key takeaway from this is that you have to search for trips for two sets of service IDs. This is covered as this chapter progresses.
In Australia, 27 January 2014 (Monday) was the holiday for Australia Day. This is used as an example to demonstrate how to retrieve service IDs.
Firstly, you need the main set of service IDs for the day. The following SQL query achieves this.
SELECT service_id FROM calendar
WHERE start_date <= '20140127' AND end_date >= '20140127'
AND monday = 1;
# Result: 1, 2, 6, 9, 18, 871, 7501
Next you need to find service IDs that are to be excluded, as achieved by the following SQL query.
SELECT service_id FROM calendar_dates
WHERE date = '20140127' AND exception_type = 2;
# Result: 1, 2, 6, 9, 18, 871, 7501
Finally, you need to find service IDs that are to be added. This query
is identical to the previous query, except for the different
exception_type
value.
SELECT service_id FROM calendar_dates
WHERE date = '20140127' AND exception_type = 1;
# Result: 12, 874, 4303, 7003
You can combine these three queries all into a single query in SQLite
using EXCEPT
and UNION
, as shown in the following SQL query.
SELECT service_id FROM calendar
WHERE start_date <= '20140127' AND end_date >= '20140127'
AND monday = 1
UNION
SELECT service_id FROM calendar_dates
WHERE date = '20140127' AND exception_type = 1
EXCEPT
SELECT service_id FROM calendar_dates
WHERE date = '20140127' AND exception_type = 2;
# Result: 12, 874, 4304, 7003
Now when you search for trips, only trips that have a matching
service_id
value are included.
Finding Trips Departing a Given Stop
In order to determine the list of services above, a base timestamp on
which to search is needed. For the purposes of this example, assume that
timestamp is 27 January 2014 at 1 PM (13:00:00
when using GTFS).
This example searches for all services departing from Adelaide Railway
Station, which has stop ID 6665
in the Adelaide Metro GTFS feed. To
find all matching stop times, the following query can be performed.
SELECT * FROM stop_times
WHERE stop_id = '6665'
AND departure_time >= '13:00:00'
AND pickup_type = 0
ORDER BY departure_time;
This returns a series of stop times that match the given criteria. The only problem is it does not yet take into account valid service IDs.
Note: This query may also return stop times that are the final stop
on a trip, which is not useful for somebody trying to find departures.
You may want to modify your database importer to override the final stop
time of each trip so its pickup_type
has a value of 1
(no pick-up) and
its first stop time so it has a drop_off_type
of 1
(no drop-off).
To make sure only the correct trips are returned, join stop_times
with trips
using trip_id
, and then include the list of service
IDs. For the purposes of this example the service IDs, stop ID and
departure time are being hard-coded. You can either embed a sub-query,
or include the service IDs via code.
SELECT t.*, st.* FROM stop_times st, trips t
WHERE st.stop_id = '6665'
AND st.trip_id = t.trip_id
AND t.service_id IN ('12', '874', '4304', '7003')
AND st.departure_time >= '13:00:00'
AND st.pickup_type = 0
ORDER BY st.departure_time;
This gives you the final list of stop times matching the desired
criteria. You can then decide specifically which data you need to
retrieve; you now have the trip_id
, meaning you can find all stop
times for a given trip if required.
If you need to restrict the results to only those that occur after the
starting stop, you can retrieve stop times with only a stop_sequence
larger than that of the stop time returned in the above query.
Finding Trips Arriving at a Given Stop
In order to find the trips arriving at a given stop before a specified
time, it is just a matter of making slight modifications to the above
query. Firstly, check the arrival_time
instead of
departure_time
. Also, check the drop_off_type
value instead of
pickup_type
.
SELECT t.*, st.* FROM stop_times st, trips t
WHERE st.stop_id = '6665'
AND st.trip_id = t.trip_id
AND t.service_id IN ('12', '874', '4304', '7003')
AND st.arrival_time <= '13:00:00'
AND st.drop_off_type = 0
ORDER BY st.arrival_time DESC;
For this particular data set, there are trips from four different routes
returned. If you want to restrict this to a particular route, you can
filter on the t.route_id
value.
SELECT t.*, st.* FROM stop_times st, trips t
WHERE st.stop_id = '6665'
AND st.trip_id = t.trip_id
AND t.service_id IN ('12', '874', '4304', '7003')
AND t.route_id = 'BEL'
AND st.arrival_time <= '13:00:00'
AND st.drop_off_type = 0
ORDER BY st.arrival_time DESC;
Performance of Searching Text-Based Times
These examples search using text-based arrival/departure times (such as
13:00:00
). This works because the GTFS specification mandates that
all times are HH:MM:SS
format (although H:MM:SS
is allowed for times
earlier than 10 AM).
Doing this kind of comparison (especially if you are scanning millions of rows) is quite slow and expensive. It is more efficient to convert all times stored in the database to integers that represent the number of seconds since midnight.
Note: The GTFS specification states that arrival and departure times are "noon minus 12 hours" in order to account for daylight savings time. This is effectively midnight, except for the days that daylight savings starts or finishes.
In order to achieve this, you can convert the text-based time to an
integer with H * 3600 + M * 60 + S
. For example, 13:35:21
can
be converted using the following steps.
(13 * 3600) + (35 * 60) + (21)
= 46800 + 2100 + 21
= 48921
You can then convert back to hours, minutes and seconds in order to generate timestamps in your application as shown in the following algorithm.
H = floor( 48921 / 3600 )
= floor( 13.59 )
= 13
M = floor( 48921 / 60 ) % 60
= floor( 815.35 ) % 60
= 815 % 60
= 35
S = 48921 % 60
= 21
Finding Trips Between Two Stops
Now that you know how to look up a trip from or to a given stop, the
previous query can be expanded so both the start and finish stop are
specified. The following example finds trips that depart after 1 PM. The
search only returns trips departing from Adelaide Railway Station
(stop ID 6665
) are shown. Additionally, only trips that then visit
Blackwood Railway Station (stop IDs 6670
and 101484
) are
included.
In order to achieve this, the following changes must be made to the previous examples.
- Join against stop_times twice. Once for the departure stop time and once for the arrival stop time.
- Allow for multiple stop IDs at one end. The destination in this example has two platforms, so you need to check both of them.
- **Ensure the departure time is earlier than the arrival time. **Otherwise trips heading in the opposite direction may also be returned.
The following query demonstrates how this is achieved.
SELECT t.*, st1.*, st2.*
FROM trips t, stop_times st1, stop_times st2
WHERE st1.trip_id = t.trip_id
AND st2.trip_id = t.trip_id
AND st1.stop_id = '6665'
AND st2.stop_id IN ('6670', '101484')
AND t.service_id IN ('12', '874', '4304', '7003')
AND st1.departure_time >= '13:00:00'
AND st1.pickup_type = 0
AND st2.drop_off_type = 0
AND st1.departure_time < st2.arrival_time
ORDER BY st1.departure_time;
In this example, the table alias st1
is used for the departure stop
time. Once again, the stop ID must match, as well as the departure time
and pick-up type.
For the arrival stop time the alias st2
is used. This table also
joins the trips
table using trip_id
. Since the destination has
multiple stop IDs, the SQL IN
construct is used. The arrival time is
not important in this example, so only the departure is checked.
The final thing to check is that the departure occurs before the arrival. If you do not perform this step, then trips traveling in the opposite direction may also be returned.
Note: Technically, there may be multiple results returned for the same trip. For some transit agencies, a single trip may visit the a stop more than once. If this is the case, you should also check the trip duration (arrival time minus departure time) and use the shortest trip when the same trip is returned multiple times.
Accounting for Midnight
As discussed previously, GTFS has the ability for the trips to depart or arrive after midnight for a given service day without having to specify it as part of the next service day. Consequently, while the queries above are correct, they do not necessarily paint the full picture.
In reality, when performing a trip search, you need to take into account
trips that have wrapped times (for instance, where 12:30 AM is specified
as 24:30:00
). If you want to find trips that depart after 12:30 AM
on a given day, you need to check for trips departing after
00:30:00
on that day, as well as for trips departing at
24:30:00
on the previous day.
This means that for each trip search you are left with two sets of trips, which you must then merge and present as appropriate.
Note: In reality, agencies generally do not have trips that overlap from multiple service days, so technically you often only need one query (for example, a train service might end on 12:30 AM then restart on the next service day at 4:30 AM). If your app / web site only uses a single feed where you can tune your queries manually based on how the agency operates, then you can get away with only querying a single service day. On the other hand, if you are building a scalable system that works with data from many agencies, then you need to check both days.
To demonstrate how this works in practice, the following example searches for all trips that depart after 12:30:00 AM on 14 March 2014. The examples earlier in this chapter showed how to find the service IDs for a given date. To account for midnight, service IDs for both March 14 (the "main" service date) and March 13 (the overlapping date) need to be determined.
Assume that March 13 has a service ID of C1
and March 14 has a
service ID of C2
. First you need to find the departures for March
14, as shown in the following query.
SELECT t.*, st.* FROM stop_times st, trips t
WHERE st.stop_id = 'S1'
AND st.trip_id = t.trip_id
AND t.service_id IN ('C1')
AND st.departure_time >= '00:30:00'
AND st.pickup_type = 0
ORDER BY st.departure_time;
The resultant list needs to be combined with the trips that depart after midnight from the March 13 service. To check this, it is just a matter of swapping in the right service IDs, then adding 24 hours to the search time.
SELECT t.*, st.* FROM stop_times st, trips t
WHERE st.stop_id = 'S1'
AND st.trip_id = t.trip_id
AND t.service_id IN ('C2')
AND st.departure_time >= '24:30:00'
AND st.pickup_type = 0
ORDER BY st.departure_time;
In order to get your final list of trips you must combine the results from both of these queries. If you are generating complete timestamps in order to present the options to your users, just remember to account for the results from the second query being 24 hours later.