aboutsummaryrefslogtreecommitdiff
path: root/.idea/gtfs-realtime-book/ch-09-database-storage.md
diff options
context:
space:
mode:
authorisabelle-dr2022-03-22 19:33:40 -0400
committerisabelle-dr2022-03-22 19:33:40 -0400
commit24e89c97cbbdd89348b1f02497a129ac8ac0a14f (patch)
treeac8432b0731525e889b6dfefa1401ce277af6893 /.idea/gtfs-realtime-book/ch-09-database-storage.md
parentab1b75da67be3e101e40e0ae3052d73c714b8ea3 (diff)
re arrange repo
Diffstat (limited to '.idea/gtfs-realtime-book/ch-09-database-storage.md')
-rw-r--r--.idea/gtfs-realtime-book/ch-09-database-storage.md492
1 files changed, 0 insertions, 492 deletions
diff --git a/.idea/gtfs-realtime-book/ch-09-database-storage.md b/.idea/gtfs-realtime-book/ch-09-database-storage.md
deleted file mode 100644
index e9b42f2..0000000
--- a/.idea/gtfs-realtime-book/ch-09-database-storage.md
+++ /dev/null
@@ -1,492 +0,0 @@
-## 9. Storing Feed Data in a Database
-
-This chapter will demonstrate how to save data from a GTFS-realtime feed
-into an SQLite database. In order to look up trips, stops, routes and
-other data from the GTFS-realtime feed, this SQLite database will also
-contain the data from the corresponding GTFS feed.
-
-To do this, the `GtfsToSql` and `GtfsRealTimeToSql` tools which have
-been written for the purpose of this book and the preceding book, *The
-Definitive Guide to GTFS* ([http://gtfsbook.com](http://gtfsbook.com/)),
-will be used.
-
-GTFS and GTFS-realtime feeds from the MBTA in Boston
-(<https://openmobilitydata.org/p/mbta>) will also be used.
-
-### Storing GTFS Data in an SQLite Database
-
-*The Definitive Guide to GTFS* demonstrated how to populate an SQLite
-database using `GtfsToSql`. Here is an abbreviated version of the
-steps required to do so.
-
-First, download `GtfsToSql` from
-<https://github.com/OpenMobilityData/GtfsToSql>. This is a Java command-line
-application to import a GTFS feed into an SQLite database.
-
-The pre-compiled `GtfsToSql` Java archive can be downloaded from its
-GitHub repository at
-<https://github.com/OpenMobilityData/GtfsToSql/tree/master/dist>.
-
-Next, download the MBTA GTFS feed, available from
-<http://www.mbta.com/uploadedfiles/MBTA_GTFS.zip>.
-
-```
-$ curl http://www.mbta.com/uploadedfiles/MBTA_GTFS.zip -o gtfs.zip
-
-$ unzip gtfs.zip -d mbta/
-```
-
-To create an SQLite database from this feed, the following command can
-be used:
-
-```
-$ java -jar GtfsToSql.jar -s jdbc:sqlite:./db.sqlite -g ./mbta -o
-```
-
-***Note:** The -o flag enables the recording of additional useful data in
-the database. For instance, each entry in the trips table will contain
-the departure and arrival time (which would otherwise only be available
-by looking up the `stop_times` table).*
-
-This may take a minute or two to complete (you will see progress as it
-imports the feed and then creates indexes), and at the end you will have
-a GTFS database in a file called `db.sqlite`. You can then query this
-database with the command-line `sqlite3` tool, as shown in the
-following example:
-
-```
-$ sqlite3 db.sqlite
-
-sqlite> SELECT agency_id, agency_name, agency_url, agency_lang FROM agency;
-
-2|Massport|http://www.massport.com|EN
-
-1|MBTA|[http://www.mbta.com](http://www.mbta.com/)|EN
-```
-
-***Note:** For more information about storing and querying GTFS data,
-please refer to *The Definitive Guide to GTFS*, available from
-[http://gtfsbook.com](http://gtfsbook.com/).*
-
-### Storing GTFS-realtime Data in an SQLite Database
-
-Once the GTFS data has been imported into the SQLite database, you can
-then start importing GTFS-realtime data. For this you can use the
-`GtfsRealTimeToSql` tool specifically written to import data into an
-SQLite database.
-
-***Note:** Technically you do not need the GTFS data present in the
-database as well, but having it makes it far simpler to resolve trip,
-route and stop data.*
-
-The GTFS data will change infrequently (perhaps every few weeks or
-months), while the realtime data can change several times per minute.
-`GtfsRealTimeToSql` will frequently update a feed, according to the
-refresh time specified. Each time it updates, the data saved on the
-previous iteration is deleted, as that data is no longer the most
-up-to-date data.
-
-To get started, download `GtfsRealTimeToSql` from its GitHub
-repository at <https://github.com/OpenMobilityData/GtfsRealTimeToSql>. Just
-like `GtfsToSql`, this is a Java command-line application. The
-pre-compiled `GtfsRealTimeToSql` Java archive can be downloaded from
-<https://github.com/OpenMobilityData/GtfsRealTimeToSql/tree/master/dist>.
-
-Since the `db.sqlite` database contains the MBTA GTFS feed, you can
-now run `GtfsRealTimeToSql` with one of MBTA's GTFS-realtime feeds.
-For instance, their vehicle positions feed is located at
-<http://developer.mbta.com/lib/gtrtfs/Vehicles.pb>.
-
-```
-java -jar GtfsRealTimeToSql.jar \
- -u "http://developer.mbta.com/lib/gtrtfs/Vehicles.pb" \
- -s jdbc:sqlite:./db.sqlite \
- -refresh 15
-```
-
-When you run this command, the vehicle positions feed will be retrieved
-every 15 seconds (specified by the `-refresh` parameter), and the data
-will be saved into the `db.sqlite` SQLite database.
-
-MBTA also have a trip updates feed and a service alerts feed. In order
-to load each of these feeds you will need to run `GtfsRealTimeToSql`
-three separate times. To allow it to run in the background, add the
-`-d` parameter (to make it run as a server daemon), and background it
-using **&**.
-
-To load the vehicle positions in the background, stop the previous
-command, then run the following command instead.
-
-```
-java -jar GtfsRealTimeToSql.jar \
- -u "http://developer.mbta.com/lib/gtrtfs/Vehicles.pb" \
- -s jdbc:sqlite:./db.sqlite \
- -refresh 15 \
- -d &
-```
-
-Now you can also load the trip updates into the same `db.sqlite` file.
-MBTA's trip updates feed is located at
-<http://developer.mbta.com/lib/gtrtfs/Passages.pb>. The following
-command reloads the trip updates every 30 seconds:
-
-```
-java -jar GtfsRealTimeToSql.jar \
- -u "http://developer.mbta.com/lib/gtrtfs/Passages.pb" \
- -s jdbc:sqlite:./db.sqlite \
- -refresh 30 \
- -d &
-```
-
-***Note:** You may prefer more frequent updates (such as 15 seconds), or
-even less frequent (such as 60 seconds). The more frequently you update,
-the greater your server utilization will be.*
-
-Finally, to load the service alerts feed, use the same command, but now
-with the feed located at
-<http://developer.mbta.com/lib/GTRTFS/Alerts/Alerts.pb>. Generally,
-service alerts are updated infrequently by providers, so you can use a
-refresh time such as five or ten minutes (300 or 600 seconds).
-
-```
-java -jar GtfsRealTimeToSql.jar \
- -u "http://developer.mbta.com/lib/gtrtfs/Alerts/Alerts.pb" \
- -s jdbc:sqlite:./db.sqlite \
- -refresh 300 \
- -d &
-```
-
-These three feeds will continue to be reloaded until you terminate their
-respective processes.
-
-### Querying Vehicle Positions
-
-When using `GtfsRealTimeToSql`, vehicle positions are stored in a
-table called `gtfs_rt_vehicles`. If you want to retrieve positions
-for, say, the route with an ID of 742, you can run the following query:
-
-```
-$ sqlite3 db.sqlite
-
-sqlite> SELECT route_id, trip_id, trip_date, trip_time, trip_sr, latitude, longitude
- FROM gtfs_rt_vehicles
- WHERE route_id = 742;
-```
-
-This query returns the trip descriptor and GPS coordinates for all
-vehicles on the given route. The following table shows sample results
-from this query.
-
-| `route_id` | `trip_id` | `trip_date` | `trip_time` | `trip_sr` | `latitude` | `longitude` |
-| :--------- | :--------- | :----------- | :---------- | :-------- | :--------- | :---------- |
-| 742 | 25900860 | 20150315 | | 0 | 42.347309 | -71.040359 |
-| 742 | | | | | 42.331505 | -71.065590 |
-
-***Note:** The trip_sr column corresponds to the trip's schedule
-relationship value.*
-
-This particular snapshot of vehicle position data shows two different
-trips for route 742, which corresponds to the SL2 Silver Line.
-
-```
-sqlite> SELECT route_short_name, route_long_name FROM routes WHERE
-route_id = 742;
-
-SL2|Silver Line SL2
-```
-
-The first five columns retrieved are the trip identifier fields, which
-are used to link a vehicle position with a trip from the GTFS feed. The
-first row is simple: the `trip_id` value can be used to look up the
-row from the `trips` table.
-
-```
-sqlite> SELECT service_id, trip_headsign, direction_id, block_id
-
-FROM trips
-
-WHERE trip_id = '25900860';
-```
-
-The results from this query are as follows.
-
-| `service_id` | `trip_headsign` | `direction_id` | `block_id` |
-| :--------------------------- | :-------------- | :------------- | :--------- |
-| BUSS12015-hbs15017-Sunday-02 | South Station | 1 | S742-61 |
-
-***Note:** To see all fields available in this or any other table
-(including `gtfs_rt_vehicles`), use the .schema command in SQLite. For
-instance, enter the command `.schema gtfs_rt_vehicles`.*
-
-One helpful thing MBTA does is to include the trip starting date when
-they include the trip ID. This helps to disambiguate trips that may
-start or finish around midnight or later.
-
-The second vehicle position is another matter. This record does not
-include any trip descriptor information other than the route ID. This
-means you cannot reliably link this vehicle position with a specific
-trip from the GTFS feed.
-
-However, knowing the route ID and the vehicle's coordinates may be
-enough to present useful information to the user: "A bus on the Silver
-Line SL2 route is at this location." You cannot show them if the vehicle
-is running late, early or on-time, but you can show the user where the
-vehicle is.
-
-### Querying Trip Updates
-
-The `GtfsRealTimeToSql` tool stores trip update data in two tables:
-one for the main trip update data (such as the trip descriptor), and
-another to store each individual stop time event that belongs within
-each update.
-
-For example, to retrieve all trip updates for the route with ID 742 once
-again, you could use the following query:
-
-```sql
-SELECT route_id, trip_id, trip_date, trip_time, trip_sr, vehicle_id, vehicle_label
- FROM gtfs_rt_trip_updates
- WHERE route_id = '742';
-```
-
-This query will return data similar to the following table:
-
-| `route_id` | `trip_id` | `trip_date` | `trip_time` | `trip_sr` | `vehicle_id` | `vehicle_label` |
-| :--------- | :-------- | :---------- | :---------- | :-------- | :----------- | :-------------- |
-| 742 | 25900860 | 20150315 | | 0 | y1106 | 1106 |
-| 742 | 25900856 | 20150315 | | 0 | | |
-| 742 | 25900858 | 20150315 | | 0 | | |
-
-Each of these returned records has corresponding records in the
-`gtfs_rt_trip_updates_stoptimes` table that includes the
-arrival/departure estimates for various stops on the trip.
-
-When using `GtfsRealTimeToSql`, an extra column called `update_id`
-is included on both tables so they can be linked together. For example,
-to retrieve all updates for the route ID 742, you can join the tables
-together as follows:
-
-```sql
-SELECT trip_id, arrival_time, arrival_delay, departure_time, departure_delay, stop_id, stop_sequence
- FROM gtfs_rt_trip_updates_stoptimes
- JOIN gtfs_rt_trip_updates USING (update_id)
- WHERE route_id = '742';
-```
-
-MBTA's trip updates feed only includes a stop time prediction for the
-next stop. This means that each trip in the results only has one
-corresponding record. You must then apply the delay to subsequent stop
-times for the given trip.
-
-| `trip_id` | `arrival_time` | `arrival_delay` | `departure_time` | `departure_delay` | `stop_id` | `stop_sequence` |
-| :-------- | :------------- | :-------------- | :--------------- | :---------------- | :-------- | :-------------- |
-| 25900860 | | 30 | | | 74617 | 10 |
-| 25900856 | | | | 0 | 74611 | 1 |
-| 25900858 | | | | 0 | 31255 | 1 |
-
-There are several things to note in these results. Firstly, MBTA do not
-provide a timestamp for `arrival_time` and `departure_time`; they
-only provide the delay offsets that can be compared to the scheduled
-time in the GTFS feed.
-
-Secondly, the second and third trips listed have not yet commenced. You
-can deduce this just by looking at this table, since the next stop has
-stop sequence of `1` (and therefore there are no stops before it).
-
-The first trip is delayed by 30 seconds. In other words, it will arrive
-30 seconds later than it was scheduled. The data received from the
-GTFS-realtime feed does not actually indicate the arrival timestamp, but
-you can look up the corresponding stop time from the GTFS feed to
-determine this.
-
-The following query demonstrates how to look up the arrival time:
-
-```sql
-SELECT s.stop_name, st.arrival_time
- FROM stop_times st, trips t, stops s
- WHERE st.trip_index = t.trip_index
- AND st.stop_index = s.stop_index
- AND s.stop_id = '74617'
- AND t.trip_id = '25900860'
- AND st.stop_sequence = 10;
-```
-
-***Note:** The `GtfsToSql` tool used to import the GTFS feed adds fields
-such as trip_index and stop_index in order to speed up data searching.
-There is more discussion on the rationale of this in *The Definitive
-Guide to GTFS*, available from
-[http://gtfsbook.com](http://gtfsbook.com/).*
-
-This query joins the `stop_times` table to both the `trips` and
-`stops` table in order to look up the corresponding arrival time. The
-final three rows in the query contain the values returned above (the
-`stop_id`, `trip_id` and the `stop_sequence`), to find the
-following record:
-
-```
-South Station Silver Line - Inbound|21:17:00
-```
-
-This means the scheduled arrival time for South Station Silver Line is
-9:17:00 PM. Since the estimate indicates a delay of 30 seconds, the new
-arrival time is 9:17:30 PM.
-
-***Note:** Conversely, if the delay was -30 instead of 30, the vehicle
-would be early and arrive at 9:16:30 PM.*
-
-One thing not touched upon in this example is the stop time's schedule
-relationship. The `gtfs_rt_trip_updates_stoptimes` also includes a
-column called `rship`, which is used to indicate the schedule
-relationship for the given stop. If this value was skipped (a value of
-`1`), then it means the vehicle will stop here.
-
-### Determining Predictions Using Blocks
-
-In GTFS, the `block_id` value in **trips.txt** is used to indicate a
-series of one or more trips undertaken by a single vehicle. In other
-words, once it gets to the end of one trip, it starts a new trip from
-that location. If a given trip is very short, a vehicle may perform up
-to fifty or one hundred trips in a single day.
-
-This can be useful for determining estimates for future trips that may
-not be included in the GTFS feed. For instance, if a trip is running 30
-minutes late, then it is highly likely that subsequent trips on that
-block will also be running late.
-
-Referring back to the trip data returned in the above example, the
-following data can be retrieved from the GTFS feed:
-
-```sql
-SELECT trip_id, block_id, service_id, departure_time, arrival_time
- FROM trips
- WHERE trip_id IN ('25900860', '25900856', '25900858')
- ORDER BY departure_time;
-```
-
-This returns the following data.
-
-| `trip_id` | `block_id` | `service_id` | `departure_time` | `arrival_time` |
-| :-------- | :--------- | :--------------------------- | :--------------- | :------------- |
-| 25900860 | S742-61 | BUSS12015-hbs15017-Sunday-02 | 21:04:00 | 21:17:00 |
-| 25900856 | S742-61 | BUSS12015-hbs15017-Sunday-02 | 21:18:00 | 21:28:00 |
-| 25900858 | S742-61 | BUSS12015-hbs15017-Sunday-02 | 21:35:00 | 21:48:00 |
-
-Each of these trips have the same values for `block_id` and
-`service_id`, meaning the same vehicle will complete all three trips.
-The data indicates that the first trip is running 30 seconds late, so
-its arrival time will be 21:17:30.
-
-Because the second trip is scheduled to depart at 21:18:00, there is a
-buffer time to catch up (in other words, the first trip is only 30
-seconds late, so hopefully it will not impact the second trip).
-
-If, for instance, the second trip ran 10 minutes late (and therefore
-arrived at 21:38:00 instead of 21:28:00), you could reasonably assume
-the third trip would begin at about 21:38:00 instead of 21:35:00 (about
-three minutes late).
-
-### Querying Service Alerts
-
-When using `GtfsRealTimeToSql` to store service alerts, data is stored
-in three tables. The main service alert information is stored in
-`gtfs_rt_alerts`. Since there can be any number of time ranges or
-affected entities for any given alert, there is a table to hold time
-ranges (`gtfs_rt_alerts_timeranges`) and one to hold affected entities
-(`gtfs_rt_alerts_entities`).
-
-In order to link this data together, each alert has an `alert_id`
-value (created by `GtfsRealTimeToSql`) which is also present for any
-corresponding time range and affected entities records.
-
-To retrieve service alerts from the database, you can use the following
-query:
-
-```sql
-SELECT alert_id, header, description, cause, effect FROM gtfs_rt_alerts;
-```
-
-At the time of writing, this yields the following results. The
-description has been shortened as they are quite long and descriptive in
-the original feed.
-
-| `alert_id` | `header` | `description` | `cause` | `effect` |
-| :--------- | :--------------------------- | :--------------------------------------------------------- | :------- | :------- |
-| 6 | Route 11 detour | Route 11 outbound detoured due to ... | 1 | 4 |
-| 11 | Ruggles elevator unavailable | ... Commuter Rail platform to the lobby is unavailable ... | 9 | 7 |
-| 33 | Extra Franklin Line service | | 1 | 5 |
-
-**Note: **A cause value of 1 corresponds to UNKNOWN_CAUSE, while 9
-corresponds to MAINTENANCE. An effect value of 4 corresponds to DETOUR,
-7 corresponds to OTHER_EFFECT, while 5 corresponds to
-ADDITIONAL_SERVICE.
-
-To determine the timing of these alerts, look up the
-`gtfs_rt_alerts_timeranges` for the given alerts.
-
-```
-$ export TZ=America/New_York
-
-$ sqlite3 db.sqlite
-
-sqlite> SELECT alert_id, datetime(start, 'unixepoch', 'localtime'), datetime(finish, 'unixepoch', 'localtime')
- FROM gtfs_rt_alerts_timeranges
- WHERE alert_id IN (6, 11, 33);
-```
-
-***Note:** In this example, the system timezone has been temporarily
-changed to America/New_York (the timezone specified in MBTA's
-agency.txt file) so the timestamps are formatted correctly. You may
-prefer instead to format the start and finish timestamps using your
-programming language of choice.*
-
-| `alert_id` | `start` (formatted) | `finish` (formatted) |
-| :--------- | :------------------ | :------------------- |
-| 6 | 2015-02-17 15:56:52 | |
-| 11 | 2015-03-18 06:00:00 | 2015-03-18 16:00:00 |
-| 11 | 2015-03-19 06:00:00 | 2015-03-19 16:00:00 |
-| 33 | 2015-03-16 10:58:38 | 2015-03-18 02:30:00 |
-
-These dates indicate the following:
-
-* The alert with an ID of 6 began on February 17 and has no specified end date.
-* The alert with an ID of 11 will occur over two days between 6 AM and 4 PM.
- *The alert with an ID of 33 will last for almost two days.
-
-Finally, to determine which entities (routes, trips, stops) are affected
-by these alerts, query the `gtfs_rt_alerts_entities` table.
-
-```sql
-SELECT alert_id, agency_id, route_id, route_type, stop_id, trip_id, trip_start_date, trip_start_time, trip_rship
- FROM gtfs_rt_alerts_entities WHERE alert_id IN (6, 11, 33);
-```
-
-This results in the following data, describing only an entity for the
-final service alert.
-
-| `alert_id` | `agency_id` | `route_id` | `route_type` | `stop_id` | `trip_id` | `trip_start_date` | `trip_start_time` | `trip_rship` |
-| :--------- | :---------- | :---------- | :----------- | :-------- | :-------- | :---------------- | :---------------- | :----------- |
-| 6 | 1 | CR-Franklin | 2 | | | | | |
-
-Using the `route_id` value, you can find the route from the GTFS feed:
-
-```sql
-SELECT agency_id, route_type, route_long_name, route_desc
- FROM routes WHERE route_id = 'CR-Franklin';
-```
-
-This query will result in the following data:
-
-| `agency_id` | `route_type` | `route_long_name` | `route_desc` |
-| :---------- | :----------- | :---------------- | :------------ |
-| 1 | 2 | Franklin Line | Commuter Rail |
-
-In effect, this means that if you have a web site or app displaying the
-schedule for the Franklin line, this alert should be displayed so the
-people who use the line are aware of the change.
-
-***Note:** Even though the `agency_id` and `route_type` values are included,
-you do not really need these since the `route_id` can only refer to one
-row in the GTFS feed. If instead you wanted to refer to ALL rail lines,
-the alert would have the `route_id` value blank but keep the `route_type`
-value.*
-