What is this blog about?

We are destroying the planet at an alarming rate. It's happening due to the ignorance of the world we live in, and in our age of online data access and sharing there is really no excuse for that any more.

This blog investigates novel ways of looking at large datasets. The kind everyone should care about.


This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

Wednesday, December 17, 2008

Worldwide power plant carbon emissions


I've been working with carma.org on putting their information about power plants in Google Earth, and they just posted the KML and wrote a blog post about it.

Click here to open the Google Earth layer directly.

I'm presenting this tomorrow at a poster session at the American Geophysical Union fall meeting in San Francisco.

Saturday, November 22, 2008

Fatal US car collisions, 2005-2007

Regionation is a powerful tool for browsing large datasets. I knew for a while that US has approximately 40,000 traffic-related deaths a year, but had no means of looking at individual cases. I wanted understand better what I can do to minimize the chances of getting into an accident, and I found FARS (Fatality Analysis Reporting System).

US DOT has been recording individual accident data in FARs starting from 1975, and you can query the data or download full datasets. Since 2005, most of the entries contain coordinates, and there is already a website called SafeRoadMaps that provides another query UI and map integration.

If you just want to see the Google Earth map, scroll all the way down.

Overall observations.

Number of accidents per state is roughly proportional to the population, but here are the per capita numbers:

It's clear that Pacific Coast and Northeast are much safer overall, but this can be probably explained by their higher share of urban population - people who live in the cities drive less.

However, the map above looks somewhat similar to the percentage of rural population:


It makes sense to compare one map against the other more precisely to see if how much the two sets of number correspond. In the chart below, the bigger and redder circles represent states with more absolute fatalities. The horizontal axis shows fatalities per capita in each state, and the vertical axis shows the percent of rural population (as of 2000). Pass the mouse over each circle to see which state it represents.



Indeed, there is a dependency - note how the circles line up more or less on a straight line going from the lower left to the upper right corner. We can conclude with some certainty that the more rural a state, the more traffic fatalities per capita it has.

Some states do not fit onto the straight line - those that are furthest away from it are the most unusual, as they buck the overall trend. In the upper left corner, we see New Hampshire, Vermont and Maine that are very rural, but have lower per capita traffic fatalities than other similar states. On the right-hand side, we see Wyoming that is more dangerous than we would expect. Florida and Arizona do not have very high fatality rates, but they are very non-rural, so on the balance they should still be considered more dangerous than average. Montana and Mississippi have the highest fatality rates, but they lie roughly on the straight line, so ther sad numbers are not surprising. TODO: calculate regression parameters.

Looking at fatalities per VMT (vehicle miles traveled) is a more standard way of comparing accident data from different states. I tried using 2006 data (2007 is not available yet), but note that some of the road types ther have no information at ll. Indeed, comparing similar data for 2002 with much more detailed VMT data from NCD (National County Database), part of National Mobile Inventory Model shows that the summary table linked above does omit some of the data.

Using the NCD data, I got the following map for 2002 which should give a more fair ranking of states:



It's very much biased toward ranking inland states worse, so either I'm still missing some data problems, or rural states indeed produce more fatalities per VMT. Here's a diagram similar to the one above. You still can see a straight-line dependency:



This time it's Nevada and Arizona that appear to be worse than you would expect, while Maine, New Hampshire and Vermont are still doing much better. Overall picture is about the same, though. Note that you can't substitute VMT for ruralness (you can click on the vertical axis in the chart above and set it to show VMT instead): it's not that in the inland states you have to drive further; your chances of getting into a crash grow faster then the trip length. It's driving in the rural states by itself that's more dangerous. Perhaps this means that more of rural driving is on uncongested highways, so the average speed is higher.

Of course, this has been studied before. See this paper by Littman and Fitzroy, as well as this paper by Ewing et al that clearly link higher death rates to more urban sprawl.

Plotting pedestrian fatalities shows a different picture that does not correspond to the urban/rural divide.

Florida, New Mexico, Louisiana, Arizona and South Carolina are disproportionately more dangerous for pedestrians (as well as District of Columbia that does not show on the map). (See this report from NHTSA for the full discussion of pedestrian accident trends.)

Looking at drunk driving incidents (those where the highest alcohol concentration was over 0.08) relative to the total number of accidents in each state shows yet another picture. Rates are rougly the same across the country: 25-35%, with the notable exceptions of Utah (17%) and, for some reason, North Dakota (48%). Delaware, South Carolina, Wisconsin, Montana, Texas and Louisiana also stand out with rates of 37-43%.



The most dangerous times of day to drive are Friday and Saturday nights from 6 pm to 3 am Sunday (especially the three hours between midnight and 3 am), and on weekdays 3 pm to 9 pm are worse than earlier hours. Rain or snow may feel like dangerous conditions, but almost 90% of accidents occurred in normal weather. 50% of accidents occurred in full daylight, and 30% more in darkness - so at night try to drive on well-lit streets as much as possible.


Detailed data

To plot all accidents on the same map, I have run the regionator script with the 2005-2007 FARS data. If you have Google Earth browser plugin installed, scroll down to see the map. You can also open the KML files in Google Earth or in a separate window - if you do this, it would be easier for you to follow along.

The numbers of fatalities were: 43,510 in 2005, 42,708 in 2006 and 41,059 in 2007. The decrease can probably be attributed to lower gas prices, as people drive less - 1600 fewer deaths occured in 2007 than in 2006. Remember about this the next time you complain about high gas prices! Breakdown by states shows that the 4% overall drop is not spread uniformly at all - District of Columbia, for example, had a 19% increase in fatalities in 2007 compared with 2006, while South Dakota had a 24% drop.

I have plotted on the map 107,037 accidents, or 93% of the total for the three years - the other 7% did not have coordinates listed. Most of accident data mention "first harmful event", which can be approximately considered the primary cause of the crash. Here are the largest categories (see FARS user guide for the full list of categories):
Collision with a car on the same roadway42,98940%
Pedestrians involved (not necessarily killed)13,28112%
Overturn or rollover12,82112%
Collision with a tree92998.6%


The next biggest problems that caused 1000-3000 accidents each were: collision with bicycles, with cars on other roadways (which includes crossing the median, but not collisions on intersections), with poles, guardrails, embankments, signposts, fences, traffic barriers and parked cars, as well as driving into ditches and culverts.

When looking at the map, the majority of accidents seem to be located on highways, and indeed a FARS query for 2007 shows that 70% occurred at speeds 45 mph or above.

There do not seem to be particular geographic areas where fatal crashes are much more likely to occur, or that have too many drunk drivers. However, Southeast (Carolinas, Georgia and Florida) has more accidents where unlicensed drivers were present.

Pedestrian accidents are mostly found, of course, in urbanized areas. The overall state numbers told us that Florida is very dangerous for pedestrians, and you can immediately see it on the map - there are many pedestrian accidents with more than one death. Florida has about as many people as New York, yet in 2007 it had almost twice as many pedestrian accidents. Georgia and North Carolina, two other states that are bad for pedestrians, reported about 160 accidents each in 2007 - the same as Pennsylvania that's 50% more populous. From the zoomed-out view, Georgia and North Carolina do not seem very different from other states, though. But when you zoom in, you'll notice that the area around Atlanta is full of dots. (I could not see any such dangerous locations in North Carolina.)

Note that San Francisco, Portland and Seattle don't even have any accidents with more than one pedestrian involved, and neither do North Dakota, Nebraska or Kansas! The city of New York has just a single such accident, and even that one happened on Van Wyck Expressway, which makes me feel safer about walking around in big cities.

Looking at only those accidents where vehicles with hazardous cargo were involved, it's clear that they don't cause many deaths, and the number of single-vehicle accidents is small, which suggests that, overall, drivers carrying hazardous cargo are more careful. Note that there is a string of such accidents with uninsured drivers across Midwest for some reason.

One of the layers shows just the accidents with "special use" vehicles. This mostly means police cars across West and Midwest (police cars in North Carolina, Georgia and Florida, again, look more accident-prone than in other states), and in Philadelphia and New York taxi cabs show up a lot. Bus accidents, fortunately, are very few, but Dallas has the sad record of having three. Northwest - Oregon and Washington - is very trouble-free. South Carolina had five school bus-related fatal accidents over the three years - it seems that no other state had them so close together. Los Angeles has a lot of accidents in every category, so I usually don't even talk about it, but note that there are two accidents with military vehicles there.

Time-to-arrival layers indicate the relative effectiveness of EMS services - these statistics do not depend on the population numbers, though probably in rural areas it would take longer for ambulances to arrive. Some states do not report this.

Looking at the distribution of the times of EMS arrival on the scene, there are no obvious problems anywhere - in every state there are a number of cases when it takes two or three hours, but mostly it's under 20 or 30 minutes. But if you turn on the layer of the time to arrival to hospital, the state of New York looks really bad compared to others. This could be a data anomaly, though - it's really strange that sometimes it takes hours to get to a New York hospital, even after accidents that occurred in cities. I would not draw any hasty conclusions from the time-to-arrival data - if some states do not report the times as often as others, the comparisons would not be fair.

Here is the full accident map. Please let me know if you have any comments of questions.

Saturday, October 4, 2008

State percentages of deficient/obsolete bridges

To follow up on the previous post, I wanted to plot some aggregate data on bridge conditions. US DOT provides per-state statistics for the total number of bridges, as well as the percentage of structurally deficient and functionally obsolete bridges.

According to the official explanation, "structurally deficient" in most cases means that deck, superstructure or substructure has a rating of 4 or below. "Functionally obsolete" means that the bridge does not pass the current standards for road width or roadway alignment. An appraisal rating of 3 or below puts bridge in this category.

Here is the map (click here to access it directly)



The default view shows only the structurally deficient bridges. Click on the eye icons in the upper right corner to turn layers on and off.

Northeast is not doing so well on obsolete bridges - DC and Massachussetts are the leaders with 52% and 40%. This could be explained by the sheer number of old bridges, I suppose.

For structurally deficient bridges, Pennsylvania and Oklahoma lead the way with 26 and 25%. Note that to highlight differences, I chose slightly different algorithms for calculating icon sizes on the two layers.

Tuesday, September 23, 2008

US 2007 bridge condition data

I have taken the 2007 bridge inspection data from Federal Highway Administration, US Department of Transportation and converted into KML. Since that's a lot of information, it's not shown all at the same time. Rather, I use a KML feature called "regions".

When the map view is zoomed very far out so that you see whole countries and continents, only a small subset of regionated data will be visible, to avoid clutter. In this case, bridge rating provides a scoring metric, so only the most critical bridges are shown at first. As the view zooms in, more and more bridges will appear until you see all available data for the small piece of the map you are looking at.

The map is only showing the bridges with the lowest condition ratings. Out of total 716,000 bridges, 189,000 bridges, or 26%, are shown. A bridge is shown if its worst condition rating is 5 or less. The ratings go from 0 to 10 for each of several bridge components that are independent from each other(see a separate post for details).

Disclaimer: Federal Highway Administration (FHWA) does not approve, endorse, or recommend this project.

Click here to open the visualization in Google Earth. If you don't have Google Earth, download it. If you have Google Earth plugin installed, scroll down to see the embedded view. Otherwise, here's a sample picture.

This was implemented using regionator - a free, open-source program that converts spreadsheets in CSV format into KML files. Feel free to contact me if you would like to know details or learn to do this yourself.

Each bridge is represented by a circle. Click on any of them to open a description balloon. It will show the road names, daily traffic count, bar charts with specific ratings, the year bridge was built and organization responsible for maintenance. All of these were taken from the original data (and dozens of other fields are available). Unfortunately, bridges are sometimes not located precisely where they should be - they could be several hundred feet or more off (some are in Iceland!).

Circle size shows the amount of daily traffic, and circle color shows what the condition of a bridge is:
Rating 0 (broken) 1 (closed) 2 (critical) 3 (serious) 4 (poor)5 (fair)
Color BlackBlack Red Pink Orange Yellow








Saturday, September 13, 2008

Bridge rating explanations

This is the explanation of the 2007 US DOT bridge safety data (taken from their document). See main post for a general introduction.


Page 38: ratings for deck, superstructure and substructure (items 58, 59, 60).

CodeDescription
9EXCELLENT CONDITION
8VERY GOOD CONDITION ‑ no problems noted.
7GOOD CONDITION ‑ some minor problems.
6SATISFACTORY CONDITION ‑ structural elements show some minor deterioration.
5FAIR CONDITION ‑ all primary structural elements are sound but may have minor section loss, cracking, spalling or scour.
4POOR CONDITION ‑ advanced section loss, deterioration, spalling or scour.
3SERIOUS CONDITION ‑ loss of section, deterioration, spalling or scour have seriously affected primary structural components. Local failures are possible. Fatigue cracks in steel or shear cracks in concrete may be present.
2CRITICAL CONDITION ‑ advanced deterioration of primary structural elements. Fatigue cracks in steel or shear cracks in concrete may be present or scour may have removed substructure support. Unless closely monitored it may be necessary to close the bridge until corrective action is taken.
1"IMMINENT" FAILURE CONDITION ‑ major deterioration or section loss present in critical structural components or obvious vertical or horizontal movement affecting structure stability. Bridge is closed to traffic but corrective action may put back in light service
0FAILED CONDITION ‑ out of service ‑ beyond corrective action


Page 40: ratings for channel and channel protection (item 61).

CodeDescription
9There are no noticeable or noteworthy deficiencies which affect the condition of the channel.
8Banks are protected or well vegetated. River control devices such as spur dikes and embankment protection are not required or are in a stable condition.
7Bank protection is in need of minor repairs. River control devices and embankment protection have a little minor damage. Banks and/or channel have minor amounts of drift.
6Bank is beginning to slump. River control devices and embankment protection have widespread minor damage. There is minor stream bed movement evident. Debris is restricting the channel slightly.
5Bank protection is being eroded. River control devices and/or embankment have major damage. Trees and brush restrict the channel.
4Bank and embankment protection is severely undermined. River control devices have severe damage. Large deposits of debris are in the channel.
3Bank protection has failed. River control devices have been destroyed. Stream bed aggradation, degradation or lateral movement has changed the channel to now threaten the bridge and/or approach roadway.
2The channel has changed to the extent the bridge is near a state of collapse.
1Bridge closed because of channel failure. Corrective action may put back in light service.
0Bridge closed because of channel failure. Replacement necessary.



Page 41: ratings for culverts (item 62)


CodeDescription
9No deficiencies.
8No noticeable or noteworthy deficiencies which affect the condition of the culvert. Insignificant scrape marks caused by drift.
7Shrinkage cracks, light scaling, and insignificant spalling which does not expose reinforcing steel. Insignificant damage caused by drift with no misalignment and not requiring corrective action. Some minor scouring has occurred near curtain walls, wingwalls, or pipes. Metal culverts have a smooth symmetrical curvature with superficial corrosion and no pitting.
6Deterioration or initial disintegration, minor chloride contamination, cracking with some leaching, or spalls on concrete or masonry walls and slabs. Local minor scouring at curtain walls, wingwalls, or pipes. Metal culverts have a smooth curvature, non-symmetrical shape, significant corrosion or moderate pitting.
5Moderate to major deterioration or disintegration, extensive cracking and leaching, or spalls on concrete or masonry walls and slabs. Minor settlement or misalignment. Noticeable scouring or erosion at curtain walls, wingwalls, or pipes. Metal culverts have significant distortion and deflection in one section, significant corrosion or deep pitting.
4Large spalls, heavy scaling, wide cracks, considerable efflorescence, or opened construction joint permitting loss of backfill. Considerable settlement or misalignment. Considerable scouring or erosion at curtain walls, wingwalls or pipes. Metal culverts have significant distortion and deflection throughout, extensive corrosion or deep pitting.
3Any condition described in Code 4 but which is excessive in scope. Severe movement or differential settlement of the segments, or loss of fill. Holes may exist in walls or slabs. Integral wingwalls nearly severed from culvert. Severe scour or erosion at curtain walls, wingwalls or pipes. Metal culverts have extreme distortion and deflection in one section, extensive corrosion, or deep pitting with scattered perforations.
2Integral wingwalls collapsed, severe settlement of roadway due to loss of fill. Section of culvert may have failed and can no longer support embankment. Complete undermining at curtain walls and pipes. Corrective action required to maintain traffic. Metal culverts have extreme distortion and deflection throughout with extensive perforations due to corrosion.
1Bridge closed. Corrective action may put back in light service.
0 Bridge closed. Replacement necessary.



Page 45: comparison with modern criteria (items 67-72)

CodeDescription
9Superior to present desirable criteria
8Equal to present desirable criteria
7Better than present minimum criteria
6Equal to present minimum criteria
5Somewhat better than minimum adequacy to tolerate being left in place as-is
4Meets minimum tolerable limits to be left in place as is
3Basically intolerable requiring high priority of corrective action
2Basically intolerable requiring high priority of replacement
1This value of rating code not used
0Bridge closed