In the midst of running my statistical analyses for last week’s post on why fireworks are almost as trash as I am, I caught the statistical analysis bug and wanted to keep running regressions.
Naturally, I decided to look at a dataset that I use frequently for my job and which I had recently been parsing through yet again – the American Community Survey’s means of transportation to work data.
There are a litany of studies and reports that examine the relationships between commute mode and any number of factors, from gender to race to income to infrastructure. But I wanted to dig specifically into the data from Northeast Ohio, given that’s where I am, in order to suss out how various socioeconomic/demographic factors play into people’s decisions on how to get from home to work and vice versa.
Accordingly, I pulled the 2011-2015 ACS 5-year data for means of transportation to work for the 152 municipalities, villages, townships, etc. in Cuyahoga, Geauga, Lake, Lorain, and Medina Counties (i.e. the NOACA region).
I also grabbed data for various socioeconomic/demographic variables, including population, sex, race, limited English proficiency (LEP) households, share of households in poverty, share of non-native residents, mean income, share of zero vehicle households, share of homeowners/renters, educational attainment (high school degree, bachelor’s, and graduate/professional degree), and measures of equity/inequality (Gini coefficient, share of income in top quintile, share of income in top 5%). The data are available here.
Several of these variables were skewed (i.e. the data were not normally distributed), so I took the natural logs of them in order to normalize the data. Taking the logs of both the dependent (commute mode) and independent variables also makes it easier to analyze the results from the analysis – a 1% change in the independent variable produces X change in the dependent variable.
First, I ran a series of bivariate regressions to examine the relationships between each independent variable and each dependent variable. The results are displayed below. Those coefficients in bold are significant at the 5% level (i.e. p <0.05), while those that are bold and italicized are significant at the 1% level (p <0.01).
As you can see, there are several statistically significant relationships. For instance, a 1% increase in the share of zero-vehicle households reduces the single-occupant vehicle (SOV) rate by 0.63%, while an increase in the share of people with at least a high school degree increases the SOV rate by 0.79%. Race, sex, and income seems to affect transit ridership, while educational attainment and nationality seem to affect the odds that someone will carpool. Wealthier, better educated, homeowners, in turn, seem far more likely to work from home, which makes intuitive sense.
But bivariate regressions can only tell us so much, as they do not control for other variables. Accordingly, I ran a series of multivariate regressions, experimenting with different combinations of variables to see which generated the highest R2 values -that is, which set of variables explains the largest amount of variation in commute mode choice.
Ultimately, I ended up with the following set of independent variables: total population, percent female, percent white, percent black, median age, percent LEP, percent below poverty level, mean income, Gini coefficient, percent zero-vehicle, percent high school diploma, and percent graduate/professional degree. (Note: as the bivariate regression results illustrate, for the dichotomous variables – those for which there are only 2 choices, like male/female or percent LEP/percent non-LEP – the results for the option that I did not include are the same, just with the opposite sign.)
The results for these model runs are shown below. Like the bivariate results, the coefficient values show the change in the commute mode share for each 1% change in the independent variable. However, these results also take into account the other variables in the model, so it shows the change in commute mode, controlling for all other factors considered.
While the R2 values show that these variables do not explain all of the variation in commute mode, particularly for walking, they do explain a significant amount of it.
Let’s take each commute mode in turn. First, it appears that older people with at least a high school diploma are more likely to drive alone to work. Interestingly, both white and Black people are similarly inclined to drive alone, with a 1% increase in the relative share of each racial group associated with a 0.2 and 0.24% increase in SOV rate (respectively). Strangely, people with limited English skills are more likely to drive alone, with a 1% change in LEP population associated with a 0.12% increase in driving alone.
Unsurprisingly, people with no cars available and lower incomes are significantly less likely to drive alone. A 1% change of these variables reduces the SOV rate by 0.32% (zero vehicles) and 0.19% (poverty level). People with an advanced degrees and higher incomes are also far less likely to drive to work by themselves, which is perhaps unexpected.
For carpooling, the four significant variables are population size, LEP share, mean income, and zero-vehicle households share. That a 1% increase in population would increase carpooling by 0.19% makes sense, as carpooling depends on network externalities – the more potential matches, the better. Both the effect of LEP share and carless households also make sense, but the effect of mean income, which increases carpooling, is a bit odd.
Both transit and walking only have two significant correlates: percent white and mean income for the former, and zero vehicle households and graduate degree for the latter.
Wealthier white people are considerable less likely to take transit in Northeast Ohio (shocker), while highly educated, car-free individuals are more likely to walk to work.
Biking displays some of the strongest links to the independent variables, though they are not necessarily intuitive.
Each 1% increase in population reduces the bike mode share by 0.58%, perhaps suggesting that biking to work is more widespread in suburban areas. But, on the other hand, bike commuting is clearly not the domain of the wealthy, as a 1% increase in a municipality’s mean income drops its bike share by a whopping 1.25%.
Perhaps supporting the notion that lower income individuals are more inclined to bike, each 1% increase in the Gini coefficient (higher values indicate higher levels of economic inequality, and vice versa) increases the bike share by 0.67%.
Lastly, several variables significantly influence the rate of telecommuting. Wealthier, better educated, and carless households are more likely to work at home, which duh. For whatever reason, more equitable towns also have higher telecommute shares, which I’m just going to file under ¯_(ツ)_/¯.
Most of these results largely correspond with the broader literature, but it’s always worth bringing the analysis down to the local level.
Perhaps getting a stronger handle on the trends locally can better inform our policy and outreach interventions to reduce our region’s inflated SOV rate.