aboutsummaryrefslogtreecommitdiff
path: root/.idea/gtfs-book/ch-17-searching-for-trips.md
diff options
context:
space:
mode:
Diffstat (limited to '.idea/gtfs-book/ch-17-searching-for-trips.md')
-rw-r--r--.idea/gtfs-book/ch-17-searching-for-trips.md346
1 files changed, 346 insertions, 0 deletions
diff --git a/.idea/gtfs-book/ch-17-searching-for-trips.md b/.idea/gtfs-book/ch-17-searching-for-trips.md
new file mode 100644
index 0000000..dd0a901
--- /dev/null
+++ b/.idea/gtfs-book/ch-17-searching-for-trips.md
@@ -0,0 +1,346 @@
+## 17. Searching for Trips**
+
+This section shows you how to search for trips in a GTFS feed based on
+specified times and stops.
+
+The three scenarios covered are:
+
+* Finding all stops departing from a given stop after a certain time
+* Finding all stops arriving at a given stop before a certain time
+* Finding all trips between two stops after a given time
+
+The first and second scenarios are the simplest, because they only rely
+on a single end of each trip. The third scenario is more complex because
+you have to ensure that each trip returned visits both the start and
+finish stops.
+
+When searching for trips, the first thing you need to know is which
+services are running for the given search time.
+
+### Finding Service IDs
+
+The first step in searching for trips is to determine which services are
+running. To begin with, you need to find service IDs for a given date.
+You then need to handle exceptions accordingly. That is, you need to add
+service IDs and remove service IDs based on the rules in
+`calendar_dates.txt`.
+
+***Note:** In *Scheduling Past Midnight*, you were shown how
+GTFS works with times past midnight. The key takeaway from this is that
+you have to search for trips for two sets of service IDs. This is
+covered as this chapter progresses.*
+
+In Australia, 27 January 2014 (Monday) was the holiday for Australia
+Day. This is used as an example to demonstrate how to retrieve service
+IDs.
+
+Firstly, you need the main set of service IDs for the day. The following
+SQL query achieves this.
+
+```sql
+SELECT service_id FROM calendar
+ WHERE start_date <= '20140127' AND end_date >= '20140127'
+ AND monday = 1;
+
+# Result: 1, 2, 6, 9, 18, 871, 7501
+```
+
+Next you need to find service IDs that are to be excluded, as achieved
+by the following SQL query.
+
+```sql
+SELECT service_id FROM calendar_dates
+ WHERE date = '20140127' AND exception_type = 2;
+
+# Result: 1, 2, 6, 9, 18, 871, 7501
+```
+
+Finally, you need to find service IDs that are to be added. This query
+is identical to the previous query, except for the different
+`exception_type` value.
+
+```sql
+SELECT service_id FROM calendar_dates
+ WHERE date = '20140127' AND exception_type = 1;
+
+# Result: 12, 874, 4303, 7003
+```
+
+You can combine these three queries all into a single query in SQLite
+using `EXCEPT` and `UNION`, as shown in the following SQL query.
+
+```sql
+SELECT service_id FROM calendar
+ WHERE start_date <= '20140127' AND end_date >= '20140127'
+ AND monday = 1
+
+UNION
+
+SELECT service_id FROM calendar_dates
+ WHERE date = '20140127' AND exception_type = 1
+
+EXCEPT
+
+SELECT service_id FROM calendar_dates
+ WHERE date = '20140127' AND exception_type = 2;
+
+# Result: 12, 874, 4304, 7003
+```
+
+Now when you search for trips, only trips that have a matching
+`service_id` value are included.
+
+### Finding Trips Departing a Given Stop
+
+In order to determine the list of services above, a base timestamp on
+which to search is needed. For the purposes of this example, assume that
+timestamp is 27 January 2014 at 1 PM (`13:00:00` when using GTFS).
+
+This example searches for all services departing from Adelaide Railway
+Station, which has stop ID `6665` in the Adelaide Metro GTFS feed. To
+find all matching stop times, the following query can be performed.
+
+```sql
+SELECT * FROM stop_times
+ WHERE stop_id = '6665'
+ AND departure_time >= '13:00:00'
+ AND pickup_type = 0
+ ORDER BY departure_time;
+```
+
+This returns a series of stop times that match the given criteria. The
+only problem is it does not yet take into account valid service IDs.
+
+***Note:** This query may also return stop times that are the final stop
+on a trip, which is not useful for somebody trying to find departures.
+You may want to modify your database importer to override the final stop
+time of each trip so its `pickup_type` has a value of `1` (no pick-up) and
+its first stop time so it has a `drop_off_type` of `1` (no drop-off).*
+
+To make sure only the correct trips are returned, join `stop_times`
+with `trips` using `trip_id`, and then include the list of service
+IDs. For the purposes of this example the service IDs, stop ID and
+departure time are being hard-coded. You can either embed a sub-query,
+or include the service IDs via code.
+
+```sql
+SELECT t.*, st.* FROM stop_times st, trips t
+ WHERE st.stop_id = '6665'
+ AND st.trip_id = t.trip_id
+ AND t.service_id IN ('12', '874', '4304', '7003')
+ AND st.departure_time >= '13:00:00'
+ AND st.pickup_type = 0
+ ORDER BY st.departure_time;
+```
+
+This gives you the final list of stop times matching the desired
+criteria. You can then decide specifically which data you need to
+retrieve; you now have the `trip_id`, meaning you can find all stop
+times for a given trip if required.
+
+If you need to restrict the results to only those that occur after the
+starting stop, you can retrieve stop times with only a `stop_sequence`
+larger than that of the stop time returned in the above query.
+
+### Finding Trips Arriving at a Given Stop
+
+In order to find the trips arriving at a given stop before a specified
+time, it is just a matter of making slight modifications to the above
+query. Firstly, check the `arrival_time` instead of
+`departure_time`. Also, check the `drop_off_type` value instead of
+`pickup_type`.
+
+```sql
+SELECT t.*, st.* FROM stop_times st, trips t
+ WHERE st.stop_id = '6665'
+ AND st.trip_id = t.trip_id
+ AND t.service_id IN ('12', '874', '4304', '7003')
+ AND st.arrival_time <= '13:00:00'
+ AND st.drop_off_type = 0
+ ORDER BY st.arrival_time DESC;
+```
+
+For this particular data set, there are trips from four different routes
+returned. If you want to restrict this to a particular route, you can
+filter on the `t.route_id` value.
+
+```sql
+SELECT t.*, st.* FROM stop_times st, trips t
+ WHERE st.stop_id = '6665'
+ AND st.trip_id = t.trip_id
+ AND t.service_id IN ('12', '874', '4304', '7003')
+ AND t.route_id = 'BEL'
+ AND st.arrival_time <= '13:00:00'
+ AND st.drop_off_type = 0
+ ORDER BY st.arrival_time DESC;
+```
+
+### Performance of Searching Text-Based Times
+
+These examples search using text-based arrival/departure times (such as
+`13:00:00`). This works because the GTFS specification mandates that
+all times are `HH:MM:SS` format (although `H:MM:SS` is allowed for times
+earlier than 10 AM).
+
+Doing this kind of comparison (especially if you are scanning millions
+of rows) is quite slow and expensive. It is more efficient to convert
+all times stored in the database to integers that represent the number
+of seconds since midnight.
+
+***Note:** The GTFS specification states that arrival and departure times
+are "noon minus 12 hours" in order to account for daylight savings time.
+This is effectively midnight, except for the days that daylight savings
+starts or finishes.*
+
+In order to achieve this, you can convert the text-based time to an
+integer with `H * 3600 + M * 60 + S`. For example, `13:35:21` can
+be converted using the following steps.
+
+```
+ (13 * 3600) + (35 * 60) + (21)
+= 46800 + 2100 + 21
+= 48921
+```
+
+You can then convert back to hours, minutes and seconds in order to
+generate timestamps in your application as shown in the following
+algorithm.
+
+```
+H = floor( 48921 / 3600 )
+ = floor( 13.59 )
+ = 13
+
+M = floor( 48921 / 60 ) % 60
+ = floor( 815.35 ) % 60
+ = 815 % 60
+ = 35
+
+S = 48921 % 60
+ = 21
+```
+
+### Finding Trips Between Two Stops
+
+Now that you know how to look up a trip from or to a given stop, the
+previous query can be expanded so both the start and finish stop are
+specified. The following example finds trips that depart after 1 PM. The
+search only returns trips departing from *Adelaide Railway Station*
+(stop ID `6665`) are shown. Additionally, only trips that then visit
+*Blackwood Railway Station* (stop IDs `6670` and `101484`) are
+included.
+
+In order to achieve this, the following changes must be made to the
+previous examples.
+
+* **Join against stop_times twice.** Once for the departure stop time
+ and once for the arrival stop time.
+* **Allow for multiple stop IDs at one end.** The destination in this
+ example has two platforms, so you need to check both of them.
+* **Ensure the departure time is earlier than the arrival time.
+ **Otherwise trips heading in the opposite direction may also be
+ returned.
+
+The following query demonstrates how this is achieved.
+
+```sql
+SELECT t.*, st1.*, st2.*
+ FROM trips t, stop_times st1, stop_times st2
+ WHERE st1.trip_id = t.trip_id
+ AND st2.trip_id = t.trip_id
+ AND st1.stop_id = '6665'
+ AND st2.stop_id IN ('6670', '101484')
+ AND t.service_id IN ('12', '874', '4304', '7003')
+ AND st1.departure_time >= '13:00:00'
+ AND st1.pickup_type = 0
+ AND st2.drop_off_type = 0
+ AND st1.departure_time < st2.arrival_time
+ ORDER BY st1.departure_time;
+```
+
+In this example, the table alias `st1` is used for the departure stop
+time. Once again, the stop ID must match, as well as the departure time
+and pick-up type.
+
+For the arrival stop time the alias `st2` is used. This table also
+joins the `trips` table using `trip_id`. Since the destination has
+multiple stop IDs, the SQL `IN` construct is used. The arrival time is
+not important in this example, so only the departure is checked.
+
+The final thing to check is that the departure occurs before the
+arrival. If you do not perform this step, then trips traveling in the
+opposite direction may also be returned.
+
+***Note:** Technically, there may be multiple results returned for the
+same trip. For some transit agencies, a single trip may visit the a stop
+more than once. If this is the case, you should also check the trip
+duration (arrival time minus departure time) and use the shortest trip
+when the same trip is returned multiple times.*
+
+### Accounting for Midnight
+
+As discussed previously, GTFS has the ability for the trips to depart or
+arrive after midnight for a given service day without having to specify
+it as part of the next service day. Consequently, while the queries
+above are correct, they do not necessarily paint the full picture.
+
+In reality, when performing a trip search, you need to take into account
+trips that have wrapped times (for instance, where 12:30 AM is specified
+as `24:30:00`). If you want to find trips that depart after 12:30 AM
+on a given day, you need to check for trips departing after
+`00:30:00` on that day, as well as for trips departing at
+`24:30:00` on the previous day.
+
+This means that for each trip search you are left with two sets of
+trips, which you must then merge and present as appropriate.
+
+***Note:** In reality, agencies generally do not have trips that overlap
+from multiple service days, so technically you often only need one query
+(for example, a train service might end on 12:30 AM then restart on the
+next service day at 4:30 AM). If your app / web site only uses a single
+feed where you can tune your queries manually based on how the agency
+operates, then you can get away with only querying a single service day.
+On the other hand, if you are building a scalable system that works with
+data from many agencies, then you need to check both days.*
+
+To demonstrate how this works in practice, the following example
+searches for all trips that depart after 12:30:00 AM on 14 March 2014.
+The examples earlier in this chapter showed how to find the service IDs
+for a given date. To account for midnight, service IDs for both March 14
+(the "main" service date) and March 13 (the overlapping date) need to be
+determined.
+
+Assume that March 13 has a service ID of `C1` and March 14 has a
+service ID of `C2`. First you need to find the departures for March
+14, as shown in the following query.
+
+```sql
+SELECT t.*, st.* FROM stop_times st, trips t
+ WHERE st.stop_id = 'S1'
+ AND st.trip_id = t.trip_id
+ AND t.service_id IN ('C1')
+ AND st.departure_time >= '00:30:00'
+ AND st.pickup_type = 0
+ ORDER BY st.departure_time;
+```
+
+The resultant list needs to be combined with the trips that depart after
+midnight from the March 13 service. To check this, it is just a matter
+of swapping in the right service IDs, then adding 24 hours to the search
+time.
+
+```sql
+SELECT t.*, st.* FROM stop_times st, trips t
+ WHERE st.stop_id = 'S1'
+ AND st.trip_id = t.trip_id
+ AND t.service_id IN ('C2')
+ AND st.departure_time >= '24:30:00'
+ AND st.pickup_type = 0
+ ORDER BY st.departure_time;
+```
+
+In order to get your final list of trips you must combine the results
+from both of these queries. If you are generating complete timestamps in
+order to present the options to your users, just remember to account for
+the results from the second query being 24 hours later.
+