What is this blog about?

We are destroying the planet at an alarming rate. It's happening due to the ignorance of the world we live in, and in our age of online data access and sharing there is really no excuse for that any more.

This blog investigates novel ways of looking at large datasets. The kind everyone should care about.

This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

Saturday, November 22, 2008

Fatal US car collisions, 2005-2007

Regionation is a powerful tool for browsing large datasets. I knew for a while that US has approximately 40,000 traffic-related deaths a year, but had no means of looking at individual cases. I wanted understand better what I can do to minimize the chances of getting into an accident, and I found FARS (Fatality Analysis Reporting System).

US DOT has been recording individual accident data in FARs starting from 1975, and you can query the data or download full datasets. Since 2005, most of the entries contain coordinates, and there is already a website called SafeRoadMaps that provides another query UI and map integration.

If you just want to see the Google Earth map, scroll all the way down.

Overall observations.

Number of accidents per state is roughly proportional to the population, but here are the per capita numbers:

It's clear that Pacific Coast and Northeast are much safer overall, but this can be probably explained by their higher share of urban population - people who live in the cities drive less.

However, the map above looks somewhat similar to the percentage of rural population:

It makes sense to compare one map against the other more precisely to see if how much the two sets of number correspond. In the chart below, the bigger and redder circles represent states with more absolute fatalities. The horizontal axis shows fatalities per capita in each state, and the vertical axis shows the percent of rural population (as of 2000). Pass the mouse over each circle to see which state it represents.

Indeed, there is a dependency - note how the circles line up more or less on a straight line going from the lower left to the upper right corner. We can conclude with some certainty that the more rural a state, the more traffic fatalities per capita it has.

Some states do not fit onto the straight line - those that are furthest away from it are the most unusual, as they buck the overall trend. In the upper left corner, we see New Hampshire, Vermont and Maine that are very rural, but have lower per capita traffic fatalities than other similar states. On the right-hand side, we see Wyoming that is more dangerous than we would expect. Florida and Arizona do not have very high fatality rates, but they are very non-rural, so on the balance they should still be considered more dangerous than average. Montana and Mississippi have the highest fatality rates, but they lie roughly on the straight line, so ther sad numbers are not surprising. TODO: calculate regression parameters.

Looking at fatalities per VMT (vehicle miles traveled) is a more standard way of comparing accident data from different states. I tried using 2006 data (2007 is not available yet), but note that some of the road types ther have no information at ll. Indeed, comparing similar data for 2002 with much more detailed VMT data from NCD (National County Database), part of National Mobile Inventory Model shows that the summary table linked above does omit some of the data.

Using the NCD data, I got the following map for 2002 which should give a more fair ranking of states:

It's very much biased toward ranking inland states worse, so either I'm still missing some data problems, or rural states indeed produce more fatalities per VMT. Here's a diagram similar to the one above. You still can see a straight-line dependency:

This time it's Nevada and Arizona that appear to be worse than you would expect, while Maine, New Hampshire and Vermont are still doing much better. Overall picture is about the same, though. Note that you can't substitute VMT for ruralness (you can click on the vertical axis in the chart above and set it to show VMT instead): it's not that in the inland states you have to drive further; your chances of getting into a crash grow faster then the trip length. It's driving in the rural states by itself that's more dangerous. Perhaps this means that more of rural driving is on uncongested highways, so the average speed is higher.

Of course, this has been studied before. See this paper by Littman and Fitzroy, as well as this paper by Ewing et al that clearly link higher death rates to more urban sprawl.

Plotting pedestrian fatalities shows a different picture that does not correspond to the urban/rural divide.

Florida, New Mexico, Louisiana, Arizona and South Carolina are disproportionately more dangerous for pedestrians (as well as District of Columbia that does not show on the map). (See this report from NHTSA for the full discussion of pedestrian accident trends.)

Looking at drunk driving incidents (those where the highest alcohol concentration was over 0.08) relative to the total number of accidents in each state shows yet another picture. Rates are rougly the same across the country: 25-35%, with the notable exceptions of Utah (17%) and, for some reason, North Dakota (48%). Delaware, South Carolina, Wisconsin, Montana, Texas and Louisiana also stand out with rates of 37-43%.

The most dangerous times of day to drive are Friday and Saturday nights from 6 pm to 3 am Sunday (especially the three hours between midnight and 3 am), and on weekdays 3 pm to 9 pm are worse than earlier hours. Rain or snow may feel like dangerous conditions, but almost 90% of accidents occurred in normal weather. 50% of accidents occurred in full daylight, and 30% more in darkness - so at night try to drive on well-lit streets as much as possible.

Detailed data

To plot all accidents on the same map, I have run the regionator script with the 2005-2007 FARS data. If you have Google Earth browser plugin installed, scroll down to see the map. You can also open the KML files in Google Earth or in a separate window - if you do this, it would be easier for you to follow along.

The numbers of fatalities were: 43,510 in 2005, 42,708 in 2006 and 41,059 in 2007. The decrease can probably be attributed to lower gas prices, as people drive less - 1600 fewer deaths occured in 2007 than in 2006. Remember about this the next time you complain about high gas prices! Breakdown by states shows that the 4% overall drop is not spread uniformly at all - District of Columbia, for example, had a 19% increase in fatalities in 2007 compared with 2006, while South Dakota had a 24% drop.

I have plotted on the map 107,037 accidents, or 93% of the total for the three years - the other 7% did not have coordinates listed. Most of accident data mention "first harmful event", which can be approximately considered the primary cause of the crash. Here are the largest categories (see FARS user guide for the full list of categories):
Collision with a car on the same roadway42,98940%
Pedestrians involved (not necessarily killed)13,28112%
Overturn or rollover12,82112%
Collision with a tree92998.6%

The next biggest problems that caused 1000-3000 accidents each were: collision with bicycles, with cars on other roadways (which includes crossing the median, but not collisions on intersections), with poles, guardrails, embankments, signposts, fences, traffic barriers and parked cars, as well as driving into ditches and culverts.

When looking at the map, the majority of accidents seem to be located on highways, and indeed a FARS query for 2007 shows that 70% occurred at speeds 45 mph or above.

There do not seem to be particular geographic areas where fatal crashes are much more likely to occur, or that have too many drunk drivers. However, Southeast (Carolinas, Georgia and Florida) has more accidents where unlicensed drivers were present.

Pedestrian accidents are mostly found, of course, in urbanized areas. The overall state numbers told us that Florida is very dangerous for pedestrians, and you can immediately see it on the map - there are many pedestrian accidents with more than one death. Florida has about as many people as New York, yet in 2007 it had almost twice as many pedestrian accidents. Georgia and North Carolina, two other states that are bad for pedestrians, reported about 160 accidents each in 2007 - the same as Pennsylvania that's 50% more populous. From the zoomed-out view, Georgia and North Carolina do not seem very different from other states, though. But when you zoom in, you'll notice that the area around Atlanta is full of dots. (I could not see any such dangerous locations in North Carolina.)

Note that San Francisco, Portland and Seattle don't even have any accidents with more than one pedestrian involved, and neither do North Dakota, Nebraska or Kansas! The city of New York has just a single such accident, and even that one happened on Van Wyck Expressway, which makes me feel safer about walking around in big cities.

Looking at only those accidents where vehicles with hazardous cargo were involved, it's clear that they don't cause many deaths, and the number of single-vehicle accidents is small, which suggests that, overall, drivers carrying hazardous cargo are more careful. Note that there is a string of such accidents with uninsured drivers across Midwest for some reason.

One of the layers shows just the accidents with "special use" vehicles. This mostly means police cars across West and Midwest (police cars in North Carolina, Georgia and Florida, again, look more accident-prone than in other states), and in Philadelphia and New York taxi cabs show up a lot. Bus accidents, fortunately, are very few, but Dallas has the sad record of having three. Northwest - Oregon and Washington - is very trouble-free. South Carolina had five school bus-related fatal accidents over the three years - it seems that no other state had them so close together. Los Angeles has a lot of accidents in every category, so I usually don't even talk about it, but note that there are two accidents with military vehicles there.

Time-to-arrival layers indicate the relative effectiveness of EMS services - these statistics do not depend on the population numbers, though probably in rural areas it would take longer for ambulances to arrive. Some states do not report this.

Looking at the distribution of the times of EMS arrival on the scene, there are no obvious problems anywhere - in every state there are a number of cases when it takes two or three hours, but mostly it's under 20 or 30 minutes. But if you turn on the layer of the time to arrival to hospital, the state of New York looks really bad compared to others. This could be a data anomaly, though - it's really strange that sometimes it takes hours to get to a New York hospital, even after accidents that occurred in cities. I would not draw any hasty conclusions from the time-to-arrival data - if some states do not report the times as often as others, the comparisons would not be fair.

Here is the full accident map. Please let me know if you have any comments of questions.