aboutsummaryrefslogtreecommitdiff
path: root/.idea/gtfs-book/ch-13-importing-to-sql.md
diff options
context:
space:
mode:
authorisabelle-dr2022-03-22 19:33:40 -0400
committerisabelle-dr2022-03-22 19:33:40 -0400
commit24e89c97cbbdd89348b1f02497a129ac8ac0a14f (patch)
treeac8432b0731525e889b6dfefa1401ce277af6893 /.idea/gtfs-book/ch-13-importing-to-sql.md
parentab1b75da67be3e101e40e0ae3052d73c714b8ea3 (diff)
re arrange repo
Diffstat (limited to '.idea/gtfs-book/ch-13-importing-to-sql.md')
-rw-r--r--.idea/gtfs-book/ch-13-importing-to-sql.md125
1 files changed, 0 insertions, 125 deletions
diff --git a/.idea/gtfs-book/ch-13-importing-to-sql.md b/.idea/gtfs-book/ch-13-importing-to-sql.md
deleted file mode 100644
index e96517c..0000000
--- a/.idea/gtfs-book/ch-13-importing-to-sql.md
+++ /dev/null
@@ -1,125 +0,0 @@
-## 13. Importing a GTFS Feed to SQL
-
-One of the great things about GTFS is that it is already in a format
-conducive to being used in an SQL database. The presence of various IDs
-in each of the different files makes it easy to join the tables in order
-to extract the data you require.
-
-To try this yourself, download `GtfsToSql`
-(<https://github.com/OpenMobilityData/GtfsToSql>). This is a Java
-command-line application that imports a GTFS feed to an SQLite database.
-This application also supports PostgreSQL, but the examples used here
-are for SQLite.
-
-The pre-compiled `GtfsToSql` Java archive can be downloaded from its
-GitHub repository at
-<https://github.com/OpenMobilityData/GtfsToSql/tree/master/dist>.
-
-To use `GtfsToSql`, all you need is an extracted GTFS feed. The
-following instructions demonstrate how you to import the TriMet feed
-that has been referenced throughout this book.
-
-Firstly, download and extract the feed. The following commands use curl
-to download the file, then unzip to extract the file to a sub-directory
-called `trimet`.
-
-```
-$ curl http://developer.trimet.org/schedule/gtfs.zip > gtfs.zip
-
-$ unzip gtfs.zip -d trimet/
-```
-
-To create an SQLite database from this feed, the following command can
-be used.
-
-```
-$ java -jar GtfsToSql.jar -s jdbc:sqlite:./db.sqlite -g ./trimet
-```
-
-This may take a minute or two to complete (you will see progress as it
-imports the feed and then creates indexes), and at the end you will have
-a GTFS database in a file called `db.sqlite`. You can then query this
-database with the command-line `sqlite3` tool, as shown in the
-following example.
-
-```
-$ sqlite3 db.sqlite
-sqlite> SELECT * FROM agency;
-|TriMet|http://trimet.org|America/Los_Angeles|en|503-238-7433
-
-sqlite> SELECT * FROM routes WHERE route_type = 0;
-90|||MAX Red Line||0|http://trimet.org/schedules/r090.htm||
-100|||MAX Blue Line||0|http://trimet.org/schedules/r100.htm||
-190|||MAX Yellow Line||0|http://trimet.org/schedules/r190.htm||
-193||Portland Streetcar|NS Line||0|http://trimet.org/schedules/r193.htm||
-194||Portland Streetcar|CL Line||0|http://trimet.org/schedules/r194.htm||
-200|||MAX Green Line||0|http://trimet.org/schedules/r200.htm||
-250|||Vintage Trolley||0|http://trimet.org/schedules/r250.htm||
-```
-
-The first query above finds all agencies stored in the database, while
-the second finds all routes marked as Light Rail (`route_type` of
-`0`).
-
-**Note: In the following chapters there are more SQL examples. All of
-these examples are geared towards running on an SQLite database that has
-been created in this manner.**
-
-All tables in this database match up with the corresponding GTFS
-filename (so for `agency.txt`, the table name is `agency`, while for
-`stop_times.txt` the table name is `stop_times`). The columns in SQL
-have the same name as the value in the corresponding GTFS file.
-
-***Note:** All data imported using this tool is stored as text in the
-database. This means you may need to be careful when querying integer
-data. For example, ordering stop times by stop_sequence may not produce
-expected results (for instance, 29 as a string comes before 3). Although
-it is a performance hit, you can change this behavior by casting the
-value to integer, such as: ORDER BY stop_sequence + 0. The reason
-`GtfsToSql` works in this way is because it is intended as a lightweight
-tool to be able to quickly query GTFS data. I recommend rolling your own
-importer to treat data exactly as you need it, especially in conjunction
-with some of the optimization techniques recommended later in this book.*
-
-### File Encodings
-
-The GTFS specification does not indicate whether files should be encoded
-using UTF-8, ISO-8859-1 or otherwise. Since a GTFS feed is not
-necessarily in English, you must be willing to handle an extended
-character set.
-
-The GtfsToSql tool introduced above automatically detects the encoding
-of each file using the juniversalchardet Java library
-(<https://code.google.com/p/juniversalchardet/>).
-
-I recommend you take some time looking at the source code of GtfsToSql
-to further understand this so you are aware of handling encodings
-correctly if you write your own parser.
-
-### Optimizing GTFS Feeds
-
-If you are creating a database that is to be distributed onto a mobile
-device such as an iPhone or Android phone, then disk space and
-computational power is at a premium. Even if you are setting up a
-database to be queried on a server only, then making the database
-perform as quickly as possible is still important.
-
-In the following chapters are techniques for optimizing GTFS feeds.
-There are many techniques that can be applied to improve the performance
-of GTFS, such as:
-
-* Using integer identifiers rather than string identifiers (for route
- IDs, trip IDs, stop IDs, etc.) and creating appropriate indexes
-* Removing redundant shape points and encoding shapes
-* Deleting unused data
-* Reusing repeating trip patterns.
-
-Changing the data to use integer IDs makes the greatest improvement to
-performance, but the other techniques also help significantly.
-
-Depending on your needs, there are other optimizations that can be made
-to reduce file size and speed up querying of the data, but the ease of
-implementing them may depend on your database storage system and the
-programming language used to query the data. The above list is a good
-starting point.
-