diff options
author | isabelle-dr | 2022-03-22 19:33:40 -0400 |
---|---|---|
committer | isabelle-dr | 2022-03-22 19:33:40 -0400 |
commit | 24e89c97cbbdd89348b1f02497a129ac8ac0a14f (patch) | |
tree | ac8432b0731525e889b6dfefa1401ce277af6893 /.idea/gtfs-book/ch-14-switching-to-integer-ids.md | |
parent | ab1b75da67be3e101e40e0ae3052d73c714b8ea3 (diff) |
re arrange repo
Diffstat (limited to '.idea/gtfs-book/ch-14-switching-to-integer-ids.md')
-rw-r--r-- | .idea/gtfs-book/ch-14-switching-to-integer-ids.md | 111 |
1 files changed, 0 insertions, 111 deletions
diff --git a/.idea/gtfs-book/ch-14-switching-to-integer-ids.md b/.idea/gtfs-book/ch-14-switching-to-integer-ids.md deleted file mode 100644 index 4d3914a..0000000 --- a/.idea/gtfs-book/ch-14-switching-to-integer-ids.md +++ /dev/null @@ -1,111 +0,0 @@ -## 14. Switching to Integer IDs - -There are a number of instances in a GTFS feed where IDs are used, such -as to identify routes, trips, stops and shapes. There are no specific -guidelines in GTFS as to the type of data or length an ID can be. As -such, IDs in some GTFS feeds maybe anywhere up to 30 or 40 characters -long. - -Using long strings as IDs is extremely inefficient as they make the size -of a database much larger than it needs to be, as well as making -querying the data much slower. - -To demonstrate, consider a GTFS feed where trip IDs are 30 characters -long. If there are 10,000 trips, each with an average of 30 stops in -`stop_times.txt`, then the IDs alone take up 9.3 MB of storage. -Realistically speaking, you need to index the `trip_id` field in order -to look up a trip's stop times quickly, which uses even more space. - -The following SQL statements show how you might represent GTFS without -optimizing the identifiers. For brevity, not all fields from the GTFS -feed are included here. - -```sql -CREATE TABLE trips ( - trip_id TEXT, - route_id TEXT, - service_id TEXT -); - -CREATE INDEX trips_trip_id ON trips (trip_id); - -CREATE INDEX trips_route_id ON trips (route_id); - -CREATE INDEX trips_service_id ON trips (service_id); - -CREATE TABLE stop_times ( - trip_id TEXT, - stop_id TEXT, - stop_sequence INTEGER -); - -CREATE INDEX stop_times_trip_id ON stop_times (trip_id); - -CREATE INDEX stop_times_stop_id on stop_times (stop_id); -``` - -If you were to add an integer column to `trips` called, say, -`trip_index`, then you can reference that value from `stop_times` -instead of `trip_id`. The following SQL statements show this. - -```sql -CREATE TABLE trips ( - trip_id TEXT, - trip_index INTEGER, - route_id TEXT, - service_id TEXT -); - -CREATE INDEX trips_trip_id ON trips (trip_id); -CREATE INDEX trips_trip_index ON trips (trip_index); -CREATE INDEX trips_route_id ON trips (route_id); -CREATE INDEX trips_service_id ON trips (service_id); - -CREATE TABLE stop_times ( - trip_index INTEGER, - stop_id TEXT, - stop_sequence INTEGER -); - -CREATE INDEX stop_times_trip_index ON stop_times (trip_index); - -CREATE INDEX stop_times_stop_id on stop_times (stop_id); -``` - -This results in a significant space saving (when you consider how large -`stop_times` can be), as well as being far quicker to look up stop -times based on a trip ID. Note that the original `trip_id` value is -retained so it can be referenced if required. - -Without adding `trip_index`, you would use the following query to find -stop times given a trip ID. - -```sql -SELECT * FROM stop_times - WHERE trip_id = 'SOME_LONG_TRIP_ID' - ORDER BY stop_sequence; -``` - -With the addition of `trip_index`, you need to first find the record -in `trips`. This can be achieved using the following query. This is a -small sacrifice compared to performing string comparison on all stop -times. - -```sql -SELECT * FROM stop_times - WHERE trip_index = ( - SELECT trip_index FROM trips WHERE trip_id = 'SOME_LONG_TRIP_ID' - ) - ORDER BY stop_sequence; -``` - -You can make the same change for the other IDs in the feed, such as -`route_id` and `stop_id`. For these columns you still keep (and -index) the original values in `routes` and `stops` respectively, -since you may still need to look up records based on these values. - -***Note:** Even though this book recommends optimizing feeds in this -manner, the remainder of examples in this book only use their original -IDs, in order to simplify the examples and to ensure compatibility with -the `GtfsToSql` tool introduced previously.* - |