aboutsummaryrefslogtreecommitdiff
path: root/.idea/gtfs-book/ch-14-switching-to-integer-ids.md
diff options
context:
space:
mode:
authorisabelle-dr2022-03-22 19:33:40 -0400
committerisabelle-dr2022-03-22 19:33:40 -0400
commit24e89c97cbbdd89348b1f02497a129ac8ac0a14f (patch)
treeac8432b0731525e889b6dfefa1401ce277af6893 /.idea/gtfs-book/ch-14-switching-to-integer-ids.md
parentab1b75da67be3e101e40e0ae3052d73c714b8ea3 (diff)
re arrange repo
Diffstat (limited to '.idea/gtfs-book/ch-14-switching-to-integer-ids.md')
-rw-r--r--.idea/gtfs-book/ch-14-switching-to-integer-ids.md111
1 files changed, 0 insertions, 111 deletions
diff --git a/.idea/gtfs-book/ch-14-switching-to-integer-ids.md b/.idea/gtfs-book/ch-14-switching-to-integer-ids.md
deleted file mode 100644
index 4d3914a..0000000
--- a/.idea/gtfs-book/ch-14-switching-to-integer-ids.md
+++ /dev/null
@@ -1,111 +0,0 @@
-## 14. Switching to Integer IDs
-
-There are a number of instances in a GTFS feed where IDs are used, such
-as to identify routes, trips, stops and shapes. There are no specific
-guidelines in GTFS as to the type of data or length an ID can be. As
-such, IDs in some GTFS feeds maybe anywhere up to 30 or 40 characters
-long.
-
-Using long strings as IDs is extremely inefficient as they make the size
-of a database much larger than it needs to be, as well as making
-querying the data much slower.
-
-To demonstrate, consider a GTFS feed where trip IDs are 30 characters
-long. If there are 10,000 trips, each with an average of 30 stops in
-`stop_times.txt`, then the IDs alone take up 9.3 MB of storage.
-Realistically speaking, you need to index the `trip_id` field in order
-to look up a trip's stop times quickly, which uses even more space.
-
-The following SQL statements show how you might represent GTFS without
-optimizing the identifiers. For brevity, not all fields from the GTFS
-feed are included here.
-
-```sql
-CREATE TABLE trips (
- trip_id TEXT,
- route_id TEXT,
- service_id TEXT
-);
-
-CREATE INDEX trips_trip_id ON trips (trip_id);
-
-CREATE INDEX trips_route_id ON trips (route_id);
-
-CREATE INDEX trips_service_id ON trips (service_id);
-
-CREATE TABLE stop_times (
- trip_id TEXT,
- stop_id TEXT,
- stop_sequence INTEGER
-);
-
-CREATE INDEX stop_times_trip_id ON stop_times (trip_id);
-
-CREATE INDEX stop_times_stop_id on stop_times (stop_id);
-```
-
-If you were to add an integer column to `trips` called, say,
-`trip_index`, then you can reference that value from `stop_times`
-instead of `trip_id`. The following SQL statements show this.
-
-```sql
-CREATE TABLE trips (
- trip_id TEXT,
- trip_index INTEGER,
- route_id TEXT,
- service_id TEXT
-);
-
-CREATE INDEX trips_trip_id ON trips (trip_id);
-CREATE INDEX trips_trip_index ON trips (trip_index);
-CREATE INDEX trips_route_id ON trips (route_id);
-CREATE INDEX trips_service_id ON trips (service_id);
-
-CREATE TABLE stop_times (
- trip_index INTEGER,
- stop_id TEXT,
- stop_sequence INTEGER
-);
-
-CREATE INDEX stop_times_trip_index ON stop_times (trip_index);
-
-CREATE INDEX stop_times_stop_id on stop_times (stop_id);
-```
-
-This results in a significant space saving (when you consider how large
-`stop_times` can be), as well as being far quicker to look up stop
-times based on a trip ID. Note that the original `trip_id` value is
-retained so it can be referenced if required.
-
-Without adding `trip_index`, you would use the following query to find
-stop times given a trip ID.
-
-```sql
-SELECT * FROM stop_times
- WHERE trip_id = 'SOME_LONG_TRIP_ID'
- ORDER BY stop_sequence;
-```
-
-With the addition of `trip_index`, you need to first find the record
-in `trips`. This can be achieved using the following query. This is a
-small sacrifice compared to performing string comparison on all stop
-times.
-
-```sql
-SELECT * FROM stop_times
- WHERE trip_index = (
- SELECT trip_index FROM trips WHERE trip_id = 'SOME_LONG_TRIP_ID'
- )
- ORDER BY stop_sequence;
-```
-
-You can make the same change for the other IDs in the feed, such as
-`route_id` and `stop_id`. For these columns you still keep (and
-index) the original values in `routes` and `stops` respectively,
-since you may still need to look up records based on these values.
-
-***Note:** Even though this book recommends optimizing feeds in this
-manner, the remainder of examples in this book only use their original
-IDs, in order to simplify the examples and to ensure compatibility with
-the `GtfsToSql` tool introduced previously.*
-