aboutsummaryrefslogtreecommitdiff
path: root/.idea/gtfs-book/ch-16-deleting-unused-data.md
diff options
context:
space:
mode:
authorisabelle-dr2022-03-22 19:33:40 -0400
committerisabelle-dr2022-03-22 19:33:40 -0400
commit24e89c97cbbdd89348b1f02497a129ac8ac0a14f (patch)
treeac8432b0731525e889b6dfefa1401ce277af6893 /.idea/gtfs-book/ch-16-deleting-unused-data.md
parentab1b75da67be3e101e40e0ae3052d73c714b8ea3 (diff)
re arrange repo
Diffstat (limited to '.idea/gtfs-book/ch-16-deleting-unused-data.md')
-rw-r--r--.idea/gtfs-book/ch-16-deleting-unused-data.md106
1 files changed, 0 insertions, 106 deletions
diff --git a/.idea/gtfs-book/ch-16-deleting-unused-data.md b/.idea/gtfs-book/ch-16-deleting-unused-data.md
deleted file mode 100644
index f2a2fc0..0000000
--- a/.idea/gtfs-book/ch-16-deleting-unused-data.md
+++ /dev/null
@@ -1,106 +0,0 @@
-## 16. Deleting Unused Data**
-
-Once you have imported a GTFS feed into a database, it is possible for
-there to be a lot of redundant data. This can ultimately slow down any
-querying of that data as well as bloating the size of the database. If
-you are making an app where the GTFS database is queried on the device
-then disk space and computational time are at a premium, so you must do
-what you can to reduce resource usage.
-
-The first thing to check for is expired services. You can do this by
-searching `calendar.txt` for entries that expire before today's date.
-Be aware though, you also need to ensure there are no
-`calendar_dates.txt` entries overriding these services (a service
-could have an `end_date` of, say, `20140110`, but also have a
-`calendar_dates.txt` entry for `20140131`).
-
-Firstly, find service IDs in `calendar_dates.txt` that are still
-active using the following query.
-
-```sql
-SELECT service_id FROM calendar_dates WHERE date >= '20140110';
-```
-
-***Note:** In order to improve performance of searching for dates, you
-should import the date field in `calendar_dates.txt` as an integer, as
-well as `start_date` and `end_date` in `calendar.txt`.*
-
-Any services matched in this query should not be removed. You can then
-find service IDs in `calendar.txt` with the following SQL query.
-
-```sql
-SELECT * FROM calendar WHERE end_date < '20140110'
- AND service_id NOT IN (
- SELECT service_id FROM calendar_dates WHERE date >= '20140110'
- );
-```
-
-Before deleting these services, corresponding trips and stop times must
-be removed since you need to know the service ID in order to delete a
-trip. Likewise, stop times must be deleted before trips since you need
-to know the trip IDs to be removed.
-
-```sql
-DELETE FROM stop_times WHERE trip_id IN (
- SELECT trip_id FROM trips WHERE service_id IN (
- SELECT service_id FROM calendar WHERE end_date < '20140110'
- AND service_id NOT IN (
- SELECT service_id FROM calendar_dates WHERE date >= '20140110'
- )
- )
- );
-```
-
-Now there may be a series of trips with no stop times. Rather than
-repeating the above sub-queries, a more thorough way of removing trips
-is to remove trips with no stop times.
-
-```sql
-DELETE FROM trips WHERE trip_id NOT IN (
- SELECT DISTINCT trip_id FROM stop_times
-);
-```
-
-With all `service_id` references removed, you can remove the expired
-rows from `calendar.txt` using the following SQL query.
-
-```sql
-DELETE FROM calendar WHERE end_date < '20140110'
- AND service_id NOT IN (
- SELECT DISTINCT service_id FROM calendar_dates WHERE date >= '20140110'
- );
-```
-
-The expired rows in `calendar_dates.txt` can also be removed, which
-can be achieved using the following query.
-
-```sql
-DELETE FROM calendar_dates WHERE date < '20140110';
-```
-
-There may now be some stops that are not used by any trips. These can be
-removed using the following query.
-
-```sql
-DELETE FROM stops WHERE stop_id NOT IN (
- SELECT DISTINCT stop_id FROM stop_times
-);
-```
-
-Additionally, you can remove unused shapes and routes using the
-following queries.
-
-```sql
-DELETE FROM shapes WHERE shape_id NOT IN (
- SELECT DISTINCT shape_id FROM trips
-);
-
-DELETE FROM routes WHERE route_id NOT IN (
- SELECT DISTINCT route_id FROM trips
-);
-```
-
-There are other potential rows that can be removed (such as records in
-`transfers.txt` that reference non-existent stops), but hopefully you
-get the idea from the previous queries.
-