Transit ridership data

My local Chapel Hill Transit system has started running day-time routes in my neighborhood again. I haven’t seen an official announcement for it, so I had to verify it myself with a midday bus trip downtown for a strawberry lassi at Vimala’s Curryblossom Café. In the process of looking for an announcement, I saw this ridership chart on the system’s Wikipedia page.

Line chart of bus ridership over time.

Checking out the data source, I found a national database of monthly ridership data for many US transit systems at data.transportation.gov.

When I first glanced at the above chart, I thought the up-and-down pattern was weekdays vs. weekends, but then I realized the chart is showing monthly data over 15 years or so and the pattern must be from the university influence. The low months are around the summer and winter breaks.

Given that the existing chart ends at 2016, I thought an update might be in order. I’m also a little suspicious of that trend line — it looks like a quadratic fit, which is surely not the long term nature of the trend. Here is my updated version, which I’ve uploaded to the Wikipedia page.

Line chart of bus ridership over time with a sharp drop during covid and slow recovery.

I kept the same basic structure: 15 years of monthly data with a connected line and a superimposed smooth trend curve. However, my trend curve is a spline fit and I added a manual break for the Covid drop in 2020. I kept the Y axis as daily averages instead of the monthly figures in the source data. The idea with daily averages is to account for varying month lengths.

I could have shown more historical data, but I didn’t want to make the up-and-down pattern illegible. Ridership levels still haven’t returned to their pre-pandemic levels, presumably because of more remote workers and students.

College towns

Seeing the university-driven ridership pattern made we wonder if transit data could be used to identify college towns in the US. Not that they’re hard to identify, but it’s a fun exercise. I computed the ratio of school month ridership to summer month ridership as the metric. After some experimentation to balance the schedules of semester calendars and quarter calendars, I used February, April, and October as school months and June, July, and August as summer months.

Here’s a ranking of the municipal transit systems with the highest ratios.

Bar chart of bus ridership ratios for 26 "college towns" with the highest school-to-summer ratios (greater than 1.5). The towns/schools in descending ratio order are:
Harrisonburg, VA (James Madison)
Blacksburg, VA (Virginia Tech)
State College, PA (Penn State)
Ames, IA (Iowa State)
Lubbock, TX (Texas Tech)
Denton, TX (North Texas)
Champaign-Urbana, IL (Illinois U-C)
Lynchburg, VA (Liberty)
West Lafayette, IN (Purdue)
Bloomington, IN (Indiana)
Kenosha, WI (K-12)
East Lansing, MI (Michigan State)
Gainesville, FL (Florida)
Ithaca, NY (Cornell)
Athens, GA (Georgia)
Flagstaff, AZ (Northern Arizona)
Fort Collins, CO (Colorado State)
Santa Cruz, CA (UC Santa Cruz)
Bellingham, WA (Western Washington)
Lawrence, KS (Kansas)
Madison, WI (Wisconsin–Madison)
Tallahassee, FL (Florida State)
Northampton, MA (UMass & others)
Iowa City, IA (Iowa)
St. Cloud, MN (St. Cloud State)
Chapel Hill, NC (UNC Chapel Hill)

I included systems with at least 100,000 monthly rides on average and those not having “university” in the name of the transit system. I realize that last filter will remove some actual college towns. The chart has 26 systems so that I could have Chapel Hill included.

Oddly, the two top ratios by far are from Virginia towns, which might indicate how Virginia helps fund small transit programs. Not all of the locations in the list are familiar to me, and one of them doesn’t have a major university: Kenosha, Wisconsin. Apparently, their system has a large K-12 ridership. And the Northampton system is shared by several western Massachusetts schools, such as Amherst.

Here’s the annual smoothed pattern for the larger of the above systems.

Line chart of ridership throughout a year with one curve per transit system. There's a big dip in the summer and a small one spanning the year ends.

This chart let me exercise a feature of JMP’s p-spline fitter by giving it a cycle constraint. With that constraint, the starts and ends of the curves influence each other and would join up if the December-to-January span were included.

Some of the curve variation is due to semester vs quarter calendar systems. For instance, I checked that Western Washington and UC Santa Cruz were on the quarter system and their curves are a bit shifted from the others.


Leave a Reply

Discover more from Raw Data Studies

Subscribe now to keep reading and get access to the full archive.

Continue reading