diff options
author | isabelle-dr | 2022-03-22 19:33:40 -0400 |
---|---|---|
committer | isabelle-dr | 2022-03-22 19:33:40 -0400 |
commit | 24e89c97cbbdd89348b1f02497a129ac8ac0a14f (patch) | |
tree | ac8432b0731525e889b6dfefa1401ce277af6893 /.idea/gtfs-book/ch-17-searching-for-trips.md | |
parent | ab1b75da67be3e101e40e0ae3052d73c714b8ea3 (diff) |
re arrange repo
Diffstat (limited to '.idea/gtfs-book/ch-17-searching-for-trips.md')
-rw-r--r-- | .idea/gtfs-book/ch-17-searching-for-trips.md | 346 |
1 files changed, 0 insertions, 346 deletions
diff --git a/.idea/gtfs-book/ch-17-searching-for-trips.md b/.idea/gtfs-book/ch-17-searching-for-trips.md deleted file mode 100644 index dd0a901..0000000 --- a/.idea/gtfs-book/ch-17-searching-for-trips.md +++ /dev/null @@ -1,346 +0,0 @@ -## 17. Searching for Trips** - -This section shows you how to search for trips in a GTFS feed based on -specified times and stops. - -The three scenarios covered are: - -* Finding all stops departing from a given stop after a certain time -* Finding all stops arriving at a given stop before a certain time -* Finding all trips between two stops after a given time - -The first and second scenarios are the simplest, because they only rely -on a single end of each trip. The third scenario is more complex because -you have to ensure that each trip returned visits both the start and -finish stops. - -When searching for trips, the first thing you need to know is which -services are running for the given search time. - -### Finding Service IDs - -The first step in searching for trips is to determine which services are -running. To begin with, you need to find service IDs for a given date. -You then need to handle exceptions accordingly. That is, you need to add -service IDs and remove service IDs based on the rules in -`calendar_dates.txt`. - -***Note:** In *Scheduling Past Midnight*, you were shown how -GTFS works with times past midnight. The key takeaway from this is that -you have to search for trips for two sets of service IDs. This is -covered as this chapter progresses.* - -In Australia, 27 January 2014 (Monday) was the holiday for Australia -Day. This is used as an example to demonstrate how to retrieve service -IDs. - -Firstly, you need the main set of service IDs for the day. The following -SQL query achieves this. - -```sql -SELECT service_id FROM calendar - WHERE start_date <= '20140127' AND end_date >= '20140127' - AND monday = 1; - -# Result: 1, 2, 6, 9, 18, 871, 7501 -``` - -Next you need to find service IDs that are to be excluded, as achieved -by the following SQL query. - -```sql -SELECT service_id FROM calendar_dates - WHERE date = '20140127' AND exception_type = 2; - -# Result: 1, 2, 6, 9, 18, 871, 7501 -``` - -Finally, you need to find service IDs that are to be added. This query -is identical to the previous query, except for the different -`exception_type` value. - -```sql -SELECT service_id FROM calendar_dates - WHERE date = '20140127' AND exception_type = 1; - -# Result: 12, 874, 4303, 7003 -``` - -You can combine these three queries all into a single query in SQLite -using `EXCEPT` and `UNION`, as shown in the following SQL query. - -```sql -SELECT service_id FROM calendar - WHERE start_date <= '20140127' AND end_date >= '20140127' - AND monday = 1 - -UNION - -SELECT service_id FROM calendar_dates - WHERE date = '20140127' AND exception_type = 1 - -EXCEPT - -SELECT service_id FROM calendar_dates - WHERE date = '20140127' AND exception_type = 2; - -# Result: 12, 874, 4304, 7003 -``` - -Now when you search for trips, only trips that have a matching -`service_id` value are included. - -### Finding Trips Departing a Given Stop - -In order to determine the list of services above, a base timestamp on -which to search is needed. For the purposes of this example, assume that -timestamp is 27 January 2014 at 1 PM (`13:00:00` when using GTFS). - -This example searches for all services departing from Adelaide Railway -Station, which has stop ID `6665` in the Adelaide Metro GTFS feed. To -find all matching stop times, the following query can be performed. - -```sql -SELECT * FROM stop_times - WHERE stop_id = '6665' - AND departure_time >= '13:00:00' - AND pickup_type = 0 - ORDER BY departure_time; -``` - -This returns a series of stop times that match the given criteria. The -only problem is it does not yet take into account valid service IDs. - -***Note:** This query may also return stop times that are the final stop -on a trip, which is not useful for somebody trying to find departures. -You may want to modify your database importer to override the final stop -time of each trip so its `pickup_type` has a value of `1` (no pick-up) and -its first stop time so it has a `drop_off_type` of `1` (no drop-off).* - -To make sure only the correct trips are returned, join `stop_times` -with `trips` using `trip_id`, and then include the list of service -IDs. For the purposes of this example the service IDs, stop ID and -departure time are being hard-coded. You can either embed a sub-query, -or include the service IDs via code. - -```sql -SELECT t.*, st.* FROM stop_times st, trips t - WHERE st.stop_id = '6665' - AND st.trip_id = t.trip_id - AND t.service_id IN ('12', '874', '4304', '7003') - AND st.departure_time >= '13:00:00' - AND st.pickup_type = 0 - ORDER BY st.departure_time; -``` - -This gives you the final list of stop times matching the desired -criteria. You can then decide specifically which data you need to -retrieve; you now have the `trip_id`, meaning you can find all stop -times for a given trip if required. - -If you need to restrict the results to only those that occur after the -starting stop, you can retrieve stop times with only a `stop_sequence` -larger than that of the stop time returned in the above query. - -### Finding Trips Arriving at a Given Stop - -In order to find the trips arriving at a given stop before a specified -time, it is just a matter of making slight modifications to the above -query. Firstly, check the `arrival_time` instead of -`departure_time`. Also, check the `drop_off_type` value instead of -`pickup_type`. - -```sql -SELECT t.*, st.* FROM stop_times st, trips t - WHERE st.stop_id = '6665' - AND st.trip_id = t.trip_id - AND t.service_id IN ('12', '874', '4304', '7003') - AND st.arrival_time <= '13:00:00' - AND st.drop_off_type = 0 - ORDER BY st.arrival_time DESC; -``` - -For this particular data set, there are trips from four different routes -returned. If you want to restrict this to a particular route, you can -filter on the `t.route_id` value. - -```sql -SELECT t.*, st.* FROM stop_times st, trips t - WHERE st.stop_id = '6665' - AND st.trip_id = t.trip_id - AND t.service_id IN ('12', '874', '4304', '7003') - AND t.route_id = 'BEL' - AND st.arrival_time <= '13:00:00' - AND st.drop_off_type = 0 - ORDER BY st.arrival_time DESC; -``` - -### Performance of Searching Text-Based Times - -These examples search using text-based arrival/departure times (such as -`13:00:00`). This works because the GTFS specification mandates that -all times are `HH:MM:SS` format (although `H:MM:SS` is allowed for times -earlier than 10 AM). - -Doing this kind of comparison (especially if you are scanning millions -of rows) is quite slow and expensive. It is more efficient to convert -all times stored in the database to integers that represent the number -of seconds since midnight. - -***Note:** The GTFS specification states that arrival and departure times -are "noon minus 12 hours" in order to account for daylight savings time. -This is effectively midnight, except for the days that daylight savings -starts or finishes.* - -In order to achieve this, you can convert the text-based time to an -integer with `H * 3600 + M * 60 + S`. For example, `13:35:21` can -be converted using the following steps. - -``` - (13 * 3600) + (35 * 60) + (21) -= 46800 + 2100 + 21 -= 48921 -``` - -You can then convert back to hours, minutes and seconds in order to -generate timestamps in your application as shown in the following -algorithm. - -``` -H = floor( 48921 / 3600 ) - = floor( 13.59 ) - = 13 - -M = floor( 48921 / 60 ) % 60 - = floor( 815.35 ) % 60 - = 815 % 60 - = 35 - -S = 48921 % 60 - = 21 -``` - -### Finding Trips Between Two Stops - -Now that you know how to look up a trip from or to a given stop, the -previous query can be expanded so both the start and finish stop are -specified. The following example finds trips that depart after 1 PM. The -search only returns trips departing from *Adelaide Railway Station* -(stop ID `6665`) are shown. Additionally, only trips that then visit -*Blackwood Railway Station* (stop IDs `6670` and `101484`) are -included. - -In order to achieve this, the following changes must be made to the -previous examples. - -* **Join against stop_times twice.** Once for the departure stop time - and once for the arrival stop time. -* **Allow for multiple stop IDs at one end.** The destination in this - example has two platforms, so you need to check both of them. -* **Ensure the departure time is earlier than the arrival time. - **Otherwise trips heading in the opposite direction may also be - returned. - -The following query demonstrates how this is achieved. - -```sql -SELECT t.*, st1.*, st2.* - FROM trips t, stop_times st1, stop_times st2 - WHERE st1.trip_id = t.trip_id - AND st2.trip_id = t.trip_id - AND st1.stop_id = '6665' - AND st2.stop_id IN ('6670', '101484') - AND t.service_id IN ('12', '874', '4304', '7003') - AND st1.departure_time >= '13:00:00' - AND st1.pickup_type = 0 - AND st2.drop_off_type = 0 - AND st1.departure_time < st2.arrival_time - ORDER BY st1.departure_time; -``` - -In this example, the table alias `st1` is used for the departure stop -time. Once again, the stop ID must match, as well as the departure time -and pick-up type. - -For the arrival stop time the alias `st2` is used. This table also -joins the `trips` table using `trip_id`. Since the destination has -multiple stop IDs, the SQL `IN` construct is used. The arrival time is -not important in this example, so only the departure is checked. - -The final thing to check is that the departure occurs before the -arrival. If you do not perform this step, then trips traveling in the -opposite direction may also be returned. - -***Note:** Technically, there may be multiple results returned for the -same trip. For some transit agencies, a single trip may visit the a stop -more than once. If this is the case, you should also check the trip -duration (arrival time minus departure time) and use the shortest trip -when the same trip is returned multiple times.* - -### Accounting for Midnight - -As discussed previously, GTFS has the ability for the trips to depart or -arrive after midnight for a given service day without having to specify -it as part of the next service day. Consequently, while the queries -above are correct, they do not necessarily paint the full picture. - -In reality, when performing a trip search, you need to take into account -trips that have wrapped times (for instance, where 12:30 AM is specified -as `24:30:00`). If you want to find trips that depart after 12:30 AM -on a given day, you need to check for trips departing after -`00:30:00` on that day, as well as for trips departing at -`24:30:00` on the previous day. - -This means that for each trip search you are left with two sets of -trips, which you must then merge and present as appropriate. - -***Note:** In reality, agencies generally do not have trips that overlap -from multiple service days, so technically you often only need one query -(for example, a train service might end on 12:30 AM then restart on the -next service day at 4:30 AM). If your app / web site only uses a single -feed where you can tune your queries manually based on how the agency -operates, then you can get away with only querying a single service day. -On the other hand, if you are building a scalable system that works with -data from many agencies, then you need to check both days.* - -To demonstrate how this works in practice, the following example -searches for all trips that depart after 12:30:00 AM on 14 March 2014. -The examples earlier in this chapter showed how to find the service IDs -for a given date. To account for midnight, service IDs for both March 14 -(the "main" service date) and March 13 (the overlapping date) need to be -determined. - -Assume that March 13 has a service ID of `C1` and March 14 has a -service ID of `C2`. First you need to find the departures for March -14, as shown in the following query. - -```sql -SELECT t.*, st.* FROM stop_times st, trips t - WHERE st.stop_id = 'S1' - AND st.trip_id = t.trip_id - AND t.service_id IN ('C1') - AND st.departure_time >= '00:30:00' - AND st.pickup_type = 0 - ORDER BY st.departure_time; -``` - -The resultant list needs to be combined with the trips that depart after -midnight from the March 13 service. To check this, it is just a matter -of swapping in the right service IDs, then adding 24 hours to the search -time. - -```sql -SELECT t.*, st.* FROM stop_times st, trips t - WHERE st.stop_id = 'S1' - AND st.trip_id = t.trip_id - AND t.service_id IN ('C2') - AND st.departure_time >= '24:30:00' - AND st.pickup_type = 0 - ORDER BY st.departure_time; -``` - -In order to get your final list of trips you must combine the results -from both of these queries. If you are generating complete timestamps in -order to present the options to your users, just remember to account for -the results from the second query being 24 hours later. - |