Can you think of a better way to measure service reliability than the ones your transit agencies use? Can you develop ways to analyze the system's performance that will reveal more precisely where and why things go wrong? Now, any transit geek with a head for statistics can try out these ideas, and share what they discover, for any transit agency that publishes a real-time information feed.
San Francisco's Muni was one of the first transit agencies to make all of its realtime information public. NextBus uses this data to drive its real-time information displays in shelters, and on its website. But these systems also have a terrific by-product: they generate a historical record of where every bus and train was, at every minute, on every day. The sheer mass of information makes it possible to aggregate many days' data so that the noise of random disruptions falls away and you can see the pervasive, routine ways in which the system may be failing (or succeeding!)
Prompted by an earlier post of mine, reader Tung Way Yip took a shot (temporary link). Here, he assesses a basic metric that he calls "average arrival." As I understand this, he's assessing the likelihood that you will actually wait a given number of minutes. The Y-axis is the percentage of cases in which each x-value as observed.
For a high-frequency route, this analysis is far, far more relevant to customer experience than the on-time performance metrics that most transit agencies use. On-time performance -- the difference between a trip's scheduled arrival time and its actual time -- is easy to measure. But once a line is running better than every 10 minutes, no customer is waiting specifically for the 5:32 as opposed to the 5:35 trip, so the on-time performance of that trip doesn't matter. What matters to the customer is the actual waiting time. That's what Mr. Tung is measuring here, and it's an analysis that others could take a lot further.
For a simple model of why this matters, imagine that you have a bus line running every 10 minutes, and every single bus is exactly 10 minutes late. From the standpoint of a classic on-time performance measure (which typically counts the percentage of trips that are more than five minutes late) this situation would be described as 100% failure, because 100% of all trips are late. From the customer's standpoint, on the other hand, this would be perfection: buses are coming every 10 minutes, exactly as promised. Much more about this conundrum, and the choices it requires us to confront, here.
Meanwhile, I encourage other transit geeks to dig into real-time performance data, and especially to aggregate the data for many non-holiday weekdays to smooth out the random accidents and see the more pervasive patterns. Forget whether some lines are more on time than others. For frequent service, the interesting question is how long the actual gaps between consecutive buses are. Perhaps agencies (or failing that, local transit blogs) should publish line-by-line data about the reliability of waiting times. A statistician would describe the result as the standard deviation of headway, but you could put it in more user-friendly terms: For a particular line, at a particular stop, you could present the probability that you will wait, say, 10% longer than the ideal average wait. (The universe of data points would be made up of all the minutes of a day, or other analysis period, and the observed actual waiting time for someone who arrived at the stop at that minute.)
For example, if a line runs every 10 minutes, that means your wait should be an average of 5 min and a maximum of 10. So what percentage of the time will you actually wait more than 10? What percentage of the time will you actually wait more than 15? 20? These simple curves could help people know which transit lines, or parts of lines, they can really count on. This could be really useful information if you want to make a decision on whether to live on this line, or to buy a vehicle if you do.
For wider appeal, you could also make a map of the data, perhaps in the Eric Fischer style. Before long, you'd know as much about your transit system's operations as its operators do, and be able to make really focused, quantitative comments based on incontrovertible data. Your transit agency may initially react defensively if you do this, but in the end, you'll be doing them a favor.