## 5. Protocol Buffers

The previous chapters have included extracts from GTFS-realtime feeds in
a human-readable format. This data is actually represented using a data
format called *Protocol Buffers*.

Developed by Google and initially released in 2008, Protocol Buffers are
a way of serializing structured data into a format which is intended to
be smaller and faster than XML.

Note: Remember, if you are writing a transit-related mobile app,
GTFS-realtime feeds are not intended to be consumed directly by mobile
devices due to the large amount of data transferred. Rather, you will
need an intermediate server to read the feed from the provider then
serve only relevant data to the mobile devices running your app.

Even though it looks similar to JSON data, the human-readable version of
a protocol buffer is not intended to be manually parsed. Instead, data
is extracted from a protocol buffer using native language (such as Java,
C++ or Python).

Note: Although the Protocol Buffers application can generate code in
Java, C++ or Python, all code examples in this book will be in Java.

For example, assume you have written a Java program that reads and
parses a GTFS-realtime service alerts feed (shown later in this chapter,
and in the next chapter).

In order to consume a GTFS-realtime feed provided by a transit agency
such as TriMet or MBTA, your workflow would look similar to the
following diagram:

![GTFS-realtime Consumption](images/GTFS-realtime-consumption.png)

When a transit agency or data provider want to publish a GTFS-realtime
feed, their process would be similar, except instead of reading the feed
every 15 seconds, they would write a new protocol buffer data file every
15 seconds using data received from their vehicles.

***Note:** *Chapter 11. Publishing GTFS-realtime Feeds* will
show you how to create a GTFS-realtime feed using Protocol Buffers. In
order to do so, you will need to install Protocol Buffers as
demonstrated in this chapter.*

### Installing Protocol Buffers

In order to generate code to read or write GTFS-realtime feeds in your
native language, you first need to install Protocol Buffers. Once
installed, it is capable of generating code for Java, C++ or Python.

This section shows you how to download and build the `protoc`
command-line tool on a UNIX or Linux-based system. These instructions
were derived from installing Protocol Buffers on Mac OS X 10.10.

First, download and extract the Protocol Buffers source code. At the
time of writing, the current version is 2.6.1.

```
$ curl -L \
    https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz \
    -o protobuf-2.6.1.tar.gz

$ tar -zxf protobuf-2.6.1.tar.gz

$ cd protobuf-2.6.1
```

***Note:** Visit <https://developers.google.com/protocol-buffers/> and
click the "Download" link to find the latest version. The following
instructions should still work for subsequent versions.*

Next, compile the source files using `make`. First run the
`configure` script to build the `Makefile`, then run `make`.

```
$ ./configure && make
```

***Note:** Separating the commands by && means that make will only run if
`./configure` exits successfully.*

Once compilation is complete, you can verify the build by running **make
check**. You can then install it globally on your system using **make
install**. If you do not want to install it globally, you can run
`protoc` directly from the **./src** directory instead.

```
$ make check

$ make install
```

Next verify that it has been successfully built and installed by running
the `protoc` command. The output should be "Missing input file."

```
$ protoc

Missing input file.
```

The next section will show you how to generate Java files using
`protoc` and the `gtfs-realtime.proto` file.

### Introduction to gtfs-realtime.proto

In order to generate source code files that can read a protocol buffer,
you need a `.proto` input file. Typically you won't need to create or
modify `.proto` files yourself, it is useful to have a basic
understanding of how they work.

A `.proto` file contains a series of instructions that defines the
structure of the data. In the case of GTFS-realtime, there is a file
called `gtfs-realtime.proto` which contains the structure for each of
the messages available (service alerts, vehicle positions and trip
updates).

The following is an extract from `gtfs-realtime.proto` for the
`VehiclePosition` message.

**Note: **The `TripDescriptor` and `VehicleDescriptor` types
referenced in this extract also have declarations, which are not
included here.

```
message VehiclePosition {
    optional TripDescriptor trip = 1;
    optional Position position = 2;
    optional uint32 current_stop_sequence = 3;
    
    enum VehicleStopStatus {
        INCOMING_AT = 0;
        STOPPED_AT = 1;
        IN_TRANSIT_TO = 2;
    }
    
    optional VehicleStopStatus current_status = 4 [default = IN_TRANSIT_TO];
    optional uint64 timestamp = 5;
    
    enum CongestionLevel {
        UNKNOWN_CONGESTION_LEVEL = 0;
        RUNNING_SMOOTHLY = 1;
        STOP_AND_GO = 2;
        CONGESTION = 3;
        SEVERE_CONGESTION = 4;
    }
    
    optional CongestionLevel congestion_level = 6;
    optional string stop_id = 7;
    optional VehicleDescriptor vehicle = 8;
    extensions 1000 to 1999;
}
```

Ignoring the numerical values assigned to each field (they aren't
likely to be relevant because you never directly refer to them), you can
see how the structure is the same as the specification covered earlier
in this book for vehicle positions.

Each field in a protocol buffer has a unique value assigned to it. This
value is used internally when encoding or decoding each field in
GTFS-realtime feed. If there is additional data to be represented in a
feed, the values between 1000 and 1999 are reserved for extensions.
*Chapter 10. GTFS-realtime Extensions* shows how extensions in
GTFS-realtime work.

### Compiling gtfs-realtime.proto

The next step towards consuming a GTFS-realtime feed is to compile the
`gtfs-realtime.proto` file into Java code using `protoc`. In order
to do this, you must have already created a Java project ahead of time.
`protoc` will generate the files and incorporate them directly into
your project's source tree.

These instructions assume you have created your Java project in
**/path/to/gtfsrt** and that you will download the
`gtfs-realtime.proto` file to **/path/to/protobuf**.

First, download the `gtfs-realtime.proto` file.

```
$ cd /path/to/protobuf

$ curl \
    https://developers.google.com/transit/gtfs-realtime/gtfs-realtime.proto \
    -o gtfs-realtime.proto
```

***Note:** If this URL is no longer current when you read this, you can
find the updated location at
<https://developers.google.com/transit/gtfs-realtime/>.*

In order to use `protoc`, you must specify the **--proto_path**
argument as the directory in which `gtfs-realtime.proto` resides.
Additionally, you must specify the full path to the
`gtfs-realtime.proto` file.

Typically, in a Java project your source files will reside in a
directory called `src` within the project directory. This directory
must be specified in the **--java_out** argument.

The full command to run is as follows:

```
$ protoc \
    --proto_path=/path/to/protobuf \
    --java_out=/path/to/gtfsrt/src \
    /path/to/protobuf/gtfs-realtime.proto
```

If this command runs successfully there will be no output to screen, but
there will be newly-created files in your source tree. There should now
be a **./com/google** directory in your source tree, and a package
called `com.google.transit.realtime`.

### Adding the Protocol Buffers Library

Before you can use this new package, you must add the Protocol Buffers
library to your Java project. You can either compile these files into a
Java archive (using the instructions in
**./protobuf-2.6.**1**/java/README.txt**), or you can add the Java
source files directly to your project as follows:

```
$ cd protobuf-2.6.1

$ cp -R ./java/src/main/java/com/google/protobuf \
    /path/to/gtfsrt/src/com/google/
```

If you now try to build your project, an error will occur due to a
missing package called `DescriptorProtos`. You can add this to your
project using the following command:

```
$ cd protobuf-2.6.1

$ protoc --java_out=/path/to/gtfsrt/src \
    --proto_path=./src \
    ./src/google/protobuf/descriptor.proto
```

Your project should now build successfully, meaning you can use the
`com.google.transit.realtime` package to read data from a
GTFS-realtime feed.

### Reading Data From a GTFS-realtime Feed

To read the data from a GTFS-realtime feed, you need to build a
`FeedMessage` object. The simplest way to do this is by opening an
`InputStream` for the URL of the GTFS-realtime feed.

The following code builds a `FeedMessage` for the vehicle positions
feed of the MBTA in Boston.

***Note:** To simplify the code listings in this book, package imports
are not included. All classes used are either standard Java classes, or
classes generated by Protocol Buffers.*

```java
public class YourClass {
    public void loadFeed() throws IOException {

        URL url = new URL("http://developer.mbta.com/lib/gtrtfs/Vehicles.pb");

        InputStream is = url.openStream();

        FeedMessage fm = FeedMessage.parseFrom(is);

        is.close();

        // ...
    }
}
```

A `FeedMessage` object contains zero or more entities, each of which
is either a service alert, a vehicle position, or a trip update. You can
retrieve a list of entities using **getEntityList()**, then loop over
them as follows:

```java
public class YourClass {
    public void loadFeed() throws IOException {

    // ...

    for (FeedEntity entity : fm.getEntityList()) {
        // Process the entity here
    }

    // ...

    }
}
```

Since many of the fields in GTFS-realtime are optional, you need to
check for the presence of the field you want to use before trying to
retrieve it. This is achieved using `hasFieldName()`. You can
then retrieve it using `getFieldName()`.

In the case of `FeedEntity`, you need to check which of the
GTFS-realtime messages are available. For instance, to check if the
entity contains a service alert, you would call `entity.hasAlert()`.
If the call to `hasAlert()` returns `true`, you can retrieve it
using `entity.getAlert()`.

The following code shows how to access the various entities.

```java
public class YourClass {
    public void loadFeed() throws IOException {
        // ...

        for (FeedEntity entity : fm.getEntityList()) {

            if (entity.hasAlert()) {
                Alert alert = entity.getAlert();

                // Process the alert here
            }

            if (entity.hasVehicle()) {
                VehiclePosition vp = entity.getVehicle();

                // Process the vehicle position here
            }

            if (entity.hasTripUpdate()) {
                TripUpdate tu = entity.getTripUpdate();

                // Process the trip update here
            }
        }

        // ...

    }
}
```

The next three chapters will show you how to process the `Alert`, `VehiclePosition` and `TripUpdate` objects.

### Outputting Human-Readable GTFS-realtime Feeds

Earlier chapters included human-readable extracts from GTFS-realtime
feeds. Although plain-text GTFS-realtime feeds are not designed to be
parsed directly, they can be useful for quickly determining the kinds of
data available within a feed.

All objects in a GTFS-realtime feed can be output using the
`TextFormat` class in the `protobuf` package. Passing the object to
**printToString()** will generate a human-readable version of the
GTFS-realtime element.

For instance, you can output an entire feed as follows:

```java
FeedMessage fm = ...;

String output = TextFormat.printToString(fm);

System.out.println(output);
```

Or you can output individual entities:

```java
for (FeedEntity entity : fm.getEntityList()) {
    String output = TextFormat.printToString(entity);
    
    System.out.println(output);
}
```

Alternatively, you can call the `toString()` method on these objects
to generate the same output.