aboutsummaryrefslogtreecommitdiff
path: root/gtfs-book/ch-15-optimizing-shapes.md
diff options
context:
space:
mode:
Diffstat (limited to 'gtfs-book/ch-15-optimizing-shapes.md')
-rw-r--r--gtfs-book/ch-15-optimizing-shapes.md158
1 files changed, 158 insertions, 0 deletions
diff --git a/gtfs-book/ch-15-optimizing-shapes.md b/gtfs-book/ch-15-optimizing-shapes.md
new file mode 100644
index 0000000..fbb1125
--- /dev/null
+++ b/gtfs-book/ch-15-optimizing-shapes.md
@@ -0,0 +1,158 @@
+## 15. Optimizing Shapes
+
+Shape data in a GTFS feed (that is, the records from `shapes.txt`)
+represents a large amount of data. There are a number of ways to reduce
+this data, which can help to:
+
+* Speed up data retrieval
+* Reduce the amount of data to transmit to app / web site users
+* Speed up rendering of the shape onto a map (such as a native mobile
+ map or a JavaScript map).
+
+Two ways to reduce shape data are as follows:
+
+* **Reducing the number of points in a shape.** The shapes included in
+ GTFS are often very precise and include a number of redundant
+ points. Many of these can be removed without a noticeable loss of
+ shape quality using the *Douglas-Peucker Algorithm*.
+* **Encoding all points in a shape into a single value.** The *Encoded
+ Polyline Algorithm* used in the Google Maps JavaScript API can also
+ be used with GTFS shapes. This reduces the amount of storage
+ required and also makes looking up all points in a shape far
+ quicker.
+
+### Reducing Points in a Shape
+
+Many of the shapes you find in GTFS feeds are extremely detailed. They
+often follow the exact curvature of the road and may consist of hundreds
+or thousands of points for a trip that might have only 30 or 40 stops.
+
+While this level of detail is useful, the sheer amount of data required
+to be rendered on a map can be a massive performance hit from the
+perspective of retrieving the data as well as rendering on a map.
+Realistically, shapes do not need this much detail in order to convey
+their message to your users.
+
+Consider the following shape from Portland that has been rendered using
+Google Maps. The total shape consists of 1913 points.
+
+![Original Shape](images/shape-original.jpg)
+
+Compare this now to the same shape that has had redundant points
+removed. The total number of points in this shape is 175, which
+represents about a 90% reduction.
+
+![Reduced Shape](images/shape-reduced.jpg)
+
+If you look closely, you can see some minor loss of detail, but for the
+most part, the shapes are almost identical.
+
+This reduction in points can be achieved using the Douglas-Peucker
+Algorithm. It does so by discarding points that do not deviate
+significantly between its surrounding points.
+
+The Douglas-Peucker Algorithm works as follows:
+
+* Begin with the first and last points in the path (A and B). These
+ are always kept.
+* Find the point between the first and last that is furthest away from
+ the line joining the first and last line (the orthogonal distance --
+ see the figure below).
+* If this point is greater than the allowed distance (the tolerance
+ level), the point is kept (call it X).
+* Repeat this algorithm twice: once using A as the first point and X
+ as the last point, then again using X as the first point and B as
+ the last point.
+
+This algorithm is recursive, and continues until all points have been
+checked.
+
+***Note:** The tolerance level determines how aggressively points are
+removed. A higher tolerance value is less aggressive and discards less
+data, while a lower tolerance discards more data.*
+
+The following diagram shows what orthogonal distance means.
+
+![Orthogonal Distance](images/orthogonal-distance.jpg)
+
+The following resources provide more information about the
+Douglas-Peucker Algorithm and describe how to implement it in your own
+systems:
+
+* <http://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm>
+* <http://www.loughrigg.org/rdp/>
+* <http://stackoverflow.com/questions/2573997/reduce-number-of-points-in-line>.
+
+You can often discard about 80-90% of all shape data before seeing a
+significant loss of line detail.
+
+### Encoding Shape Points
+
+A single entry in `shapes.txt` corresponds to a single point in a
+single shape. Each entry includes a shape ID, a latitude and longitude.
+
+***Note:** The `shape_dist_traveled` field is also included, but you do not
+strictly need to use this field (nor the corresponding field in
+stop_times.txt). The technique described in this section will not work
+if you intend to use `shape_dist_traveled`.*
+
+This means if you want to look up a shape by its ID, you may need to
+retrieve several hundreds of rows from a database. Using the Encoded
+Polyline Algorithm you can change your GTFS database so each shape is
+represented by a single row in a database. This means the shape can be
+found much more quickly and much less data needs to be processed to
+determine the shape.
+
+Consider the following data, taken from TriMet's `shapes.txt` file.
+This data represents the first five points of a shape.
+
+| `shape_id` | `shape_pt_lat` | `shape_pt_lon` | `shape_pt_sequence` | `shape_dist_traveled` |
+| :--------- | :------------- | :------------- | :------------------- | :--------------------- |
+| `185328` | `45.52291 ` | `-122.677372` | `1` | `0.0` |
+| `185328` | `45.522921` | `-122.67737` | `2` | `3.7` |
+| `185328` | `45.522991` | `-122.677432` | `3` | `34.0` |
+| `185328` | `45.522992` | `-122.677246` | `4` | `81.5` |
+| `185328` | `45.523002` | `-122.676567` | `5` | `255.7` |
+
+If you apply the Encoded Polyline Algorithm to this data, the
+coordinates can be represented using the following string.
+
+```
+eeztGrlwkVAAML?e@AgC
+```
+
+To learn how to arrive at this value, you can read up on the Encoded
+Polyline Algorithm at
+<https://developers.google.com/maps/documentation/utilities/polylinealgorithm>.
+
+Instead of having every single shape point in a single table, you can
+create a table that has one record per shape. The following SQL
+statement is a way you could achieve this.
+
+```sql
+CREATE TABLE shapes (
+ shape_id TEXT,
+ encoded_shape TEXT
+);
+```
+
+The following table shows how this data could be represented in a
+database.
+
+| `shape_id` | `encoded_shape` |
+| :--------- | :--------------------- |
+| `185328` | `eeztGrlwkVAAML?e@AgC` |
+
+Storing the shape in this manner means you can retrieve an entire shape
+by looking up only one database row and running it through your decoder.
+
+To further demonstrate how both the encoding and decoding works, try out
+the polyline utility at
+<https://developers.google.com/maps/documentation/utilities/polylineutility>.
+
+You can find implementations for encoding and decoding points for
+various languages at the following locations:
+
+* <http://facstaff.unca.edu/mcmcclur/GoogleMaps/EncodePolyline/>
+* <https://github.com/emcconville/google-map-polyline-encoding-tool>
+