What is this blog about?

We are destroying the planet at an alarming rate. It's happening due to the ignorance of the world we live in, and in our age of online data access and sharing there is really no excuse for that any more.

This blog investigates novel ways of looking at large datasets. The kind everyone should care about.


This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

Monday, July 9, 2012

Map of continental US climate changes, 1895-2012


Here is a map that shows changes in US climate since 1895 and since 1970.

When people talk about climate change is the US, they tend to either concentrate on what is happening with the whole country or large regions, or on very local changes that have a lot of variability. For a more detailed understanding of what exactly has been going on  over the last century we can look at areas called climate divisions

Q&A

What does the map show?

It displays overall changes in temperature and precipitation (rainfall plus snowfall) for each month since 1895. You can also see yearly and seasonal changes.

What do the colors mean?

Shades of red/purple show an increase in temperature or precipitation. Shades of green indicate a decrease. The exact ranges for each color are displayed in the legend next to the map.

Why are some areas of the map blank?

If an area is shown, it mean there's a strong trend - in other words, only if there is enough confidence that despite year-to-year variability, the values are climbing up or down. 

What are some examples of large changes?

February temperature since 1895: north and northeast of the country has warmed over 3F


November precipitation since 1895: east and southeast had over 60% increase.


Yearly temperature since 1970: some parts of Nevada have warmed up by 4-5F


Yearly precipitation since 1970: several areas in the Dakotas had 30-40% increase.


How can I see the trends for myself?

Click anywhere on the map. All the data points will be shown in a chart under the map. The trend line will be overlayed on top of the data. Even if an area is blank, you can still click on it and see the chart, but there will be no trend line.

Where did the data come from?

From NOAA (National Oceanic and Atmospheric Administration). 

Where can I learn more about US weather changes?

In June 2012, Climate Central has published a report about state-by-state temperature changes. The most comprehensive source is the national climate change section of the 2009 report on climate change impacts produced by US Global Change Research Program.

How did you compute the trend? Did you use linear regression?

Almost. Linear regression would produce approximately the same results, but it assumes certain things about the data that may not always be true. Instead, we used what statisticians call "nonparametric estimation" - in other words, tried to determine if there is a linear trend without assuming anything about the nature of the data. Specifically, we can apply the Mann-Kendall test to see if the data have any trend at all or just change randomly without moving a lot in any direction. If the test says that there is a trend (with p-value <0.05), we can use a method similar to linear regression, the Theil-Sen estimator, to find the most likely straight line that approximates the trend. 

What is p-value?

It's a statistical indicator of how well a linear trend fits, or how well it describes a series of observations. Trend estimation formulas always produce a linear trend (a straight line), but if the data are not really changing more or less together with this straight line, it makes no sense to talk about a trend. This is why trend estimation formulas also produce another number called the p-value. The p-value is always between 0 and 1, but only very small p-values point at good fits. A cutoff value of 0.05 is often used - so if the trend estimation formulas produce a p-value of 0.05 or below, the map shows a trend, otherwise it does not. See the wikipedia article for more information.

I'm grateful to several people who provided help and comments on this map: Chris, Boris, Nadav, Eli, Eric, Tyler.

Saturday, December 18, 2010

Earth Engine

For the last year I've been working on the Google Earth Engine project. See this post for the Earth Engine launch announcement.

Tuesday, July 7, 2009

Greenhouse gas map for Kyoto parties

The Secretariat of the United Nations Framework Convention on Climate Change (UNFCCC) collects national data on greenhouse gas emisssions for the parties in the Kyoto protocol.

Here's the map of the UNFCCC inventory data showing emissions for 40 Kyoto countries plus EU. The data contain absolute emissions for 1990 to 2006, as well as the deltas. The map shows the totals plus breakdown by individual sectors like energy, industry, agriculture and others.

Terms that could be unfamiliar:

LULUCF means "land use, land-use change and forestry".

"Base year" is 1990 for most, but not all countries.

"International bunkers" are emissions from ships and aviation. They are not included in country totals.

Friday, February 20, 2009

Here's my post on the Google LatLon blog about the map I've created for Vulcan, a project computing US CO2 emissions run by Dr Kevin Gurney at Purdue University.

Vulcan released the first round of data in April 2008, and yesterday more details and an interactive map showing state and county absolute and per capita emissions were added. Individual power plants and airports show up too. The page uses the Google Earth browser plugin, but if you are running Linux or you just want to look at the raw KML layers, you can load the top-level page in Google Earth. It's using network links, so you won't wind up pulling everything in at once.

Here's a KML flythrough tour and the video produced from it:

Wednesday, December 17, 2008

Worldwide power plant carbon emissions


I've been working with carma.org on putting their information about power plants in Google Earth, and they just posted the KML and wrote a blog post about it.

Click here to open the Google Earth layer directly.

I'm presenting this tomorrow at a poster session at the American Geophysical Union fall meeting in San Francisco.

Saturday, November 22, 2008

Fatal US car collisions, 2005-2007

Regionation is a powerful tool for browsing large datasets. I knew for a while that US has approximately 40,000 traffic-related deaths a year, but had no means of looking at individual cases. I wanted understand better what I can do to minimize the chances of getting into an accident, and I found FARS (Fatality Analysis Reporting System).

US DOT has been recording individual accident data in FARs starting from 1975, and you can query the data or download full datasets. Since 2005, most of the entries contain coordinates, and there is already a website called SafeRoadMaps that provides another query UI and map integration.

If you just want to see the Google Earth map, scroll all the way down.

Overall observations.

Number of accidents per state is roughly proportional to the population, but here are the per capita numbers:

It's clear that Pacific Coast and Northeast are much safer overall, but this can be probably explained by their higher share of urban population - people who live in the cities drive less.

However, the map above looks somewhat similar to the percentage of rural population:


It makes sense to compare one map against the other more precisely to see if how much the two sets of number correspond. In the chart below, the bigger and redder circles represent states with more absolute fatalities. The horizontal axis shows fatalities per capita in each state, and the vertical axis shows the percent of rural population (as of 2000). Pass the mouse over each circle to see which state it represents.



Indeed, there is a dependency - note how the circles line up more or less on a straight line going from the lower left to the upper right corner. We can conclude with some certainty that the more rural a state, the more traffic fatalities per capita it has.

Some states do not fit onto the straight line - those that are furthest away from it are the most unusual, as they buck the overall trend. In the upper left corner, we see New Hampshire, Vermont and Maine that are very rural, but have lower per capita traffic fatalities than other similar states. On the right-hand side, we see Wyoming that is more dangerous than we would expect. Florida and Arizona do not have very high fatality rates, but they are very non-rural, so on the balance they should still be considered more dangerous than average. Montana and Mississippi have the highest fatality rates, but they lie roughly on the straight line, so ther sad numbers are not surprising. TODO: calculate regression parameters.

Looking at fatalities per VMT (vehicle miles traveled) is a more standard way of comparing accident data from different states. I tried using 2006 data (2007 is not available yet), but note that some of the road types ther have no information at ll. Indeed, comparing similar data for 2002 with much more detailed VMT data from NCD (National County Database), part of National Mobile Inventory Model shows that the summary table linked above does omit some of the data.

Using the NCD data, I got the following map for 2002 which should give a more fair ranking of states:



It's very much biased toward ranking inland states worse, so either I'm still missing some data problems, or rural states indeed produce more fatalities per VMT. Here's a diagram similar to the one above. You still can see a straight-line dependency:



This time it's Nevada and Arizona that appear to be worse than you would expect, while Maine, New Hampshire and Vermont are still doing much better. Overall picture is about the same, though. Note that you can't substitute VMT for ruralness (you can click on the vertical axis in the chart above and set it to show VMT instead): it's not that in the inland states you have to drive further; your chances of getting into a crash grow faster then the trip length. It's driving in the rural states by itself that's more dangerous. Perhaps this means that more of rural driving is on uncongested highways, so the average speed is higher.

Of course, this has been studied before. See this paper by Littman and Fitzroy, as well as this paper by Ewing et al that clearly link higher death rates to more urban sprawl.

Plotting pedestrian fatalities shows a different picture that does not correspond to the urban/rural divide.

Florida, New Mexico, Louisiana, Arizona and South Carolina are disproportionately more dangerous for pedestrians (as well as District of Columbia that does not show on the map). (See this report from NHTSA for the full discussion of pedestrian accident trends.)

Looking at drunk driving incidents (those where the highest alcohol concentration was over 0.08) relative to the total number of accidents in each state shows yet another picture. Rates are rougly the same across the country: 25-35%, with the notable exceptions of Utah (17%) and, for some reason, North Dakota (48%). Delaware, South Carolina, Wisconsin, Montana, Texas and Louisiana also stand out with rates of 37-43%.



The most dangerous times of day to drive are Friday and Saturday nights from 6 pm to 3 am Sunday (especially the three hours between midnight and 3 am), and on weekdays 3 pm to 9 pm are worse than earlier hours. Rain or snow may feel like dangerous conditions, but almost 90% of accidents occurred in normal weather. 50% of accidents occurred in full daylight, and 30% more in darkness - so at night try to drive on well-lit streets as much as possible.


Detailed data

To plot all accidents on the same map, I have run the regionator script with the 2005-2007 FARS data. If you have Google Earth browser plugin installed, scroll down to see the map. You can also open the KML files in Google Earth or in a separate window - if you do this, it would be easier for you to follow along.

The numbers of fatalities were: 43,510 in 2005, 42,708 in 2006 and 41,059 in 2007. The decrease can probably be attributed to lower gas prices, as people drive less - 1600 fewer deaths occured in 2007 than in 2006. Remember about this the next time you complain about high gas prices! Breakdown by states shows that the 4% overall drop is not spread uniformly at all - District of Columbia, for example, had a 19% increase in fatalities in 2007 compared with 2006, while South Dakota had a 24% drop.

I have plotted on the map 107,037 accidents, or 93% of the total for the three years - the other 7% did not have coordinates listed. Most of accident data mention "first harmful event", which can be approximately considered the primary cause of the crash. Here are the largest categories (see FARS user guide for the full list of categories):
Collision with a car on the same roadway42,98940%
Pedestrians involved (not necessarily killed)13,28112%
Overturn or rollover12,82112%
Collision with a tree92998.6%


The next biggest problems that caused 1000-3000 accidents each were: collision with bicycles, with cars on other roadways (which includes crossing the median, but not collisions on intersections), with poles, guardrails, embankments, signposts, fences, traffic barriers and parked cars, as well as driving into ditches and culverts.

When looking at the map, the majority of accidents seem to be located on highways, and indeed a FARS query for 2007 shows that 70% occurred at speeds 45 mph or above.

There do not seem to be particular geographic areas where fatal crashes are much more likely to occur, or that have too many drunk drivers. However, Southeast (Carolinas, Georgia and Florida) has more accidents where unlicensed drivers were present.

Pedestrian accidents are mostly found, of course, in urbanized areas. The overall state numbers told us that Florida is very dangerous for pedestrians, and you can immediately see it on the map - there are many pedestrian accidents with more than one death. Florida has about as many people as New York, yet in 2007 it had almost twice as many pedestrian accidents. Georgia and North Carolina, two other states that are bad for pedestrians, reported about 160 accidents each in 2007 - the same as Pennsylvania that's 50% more populous. From the zoomed-out view, Georgia and North Carolina do not seem very different from other states, though. But when you zoom in, you'll notice that the area around Atlanta is full of dots. (I could not see any such dangerous locations in North Carolina.)

Note that San Francisco, Portland and Seattle don't even have any accidents with more than one pedestrian involved, and neither do North Dakota, Nebraska or Kansas! The city of New York has just a single such accident, and even that one happened on Van Wyck Expressway, which makes me feel safer about walking around in big cities.

Looking at only those accidents where vehicles with hazardous cargo were involved, it's clear that they don't cause many deaths, and the number of single-vehicle accidents is small, which suggests that, overall, drivers carrying hazardous cargo are more careful. Note that there is a string of such accidents with uninsured drivers across Midwest for some reason.

One of the layers shows just the accidents with "special use" vehicles. This mostly means police cars across West and Midwest (police cars in North Carolina, Georgia and Florida, again, look more accident-prone than in other states), and in Philadelphia and New York taxi cabs show up a lot. Bus accidents, fortunately, are very few, but Dallas has the sad record of having three. Northwest - Oregon and Washington - is very trouble-free. South Carolina had five school bus-related fatal accidents over the three years - it seems that no other state had them so close together. Los Angeles has a lot of accidents in every category, so I usually don't even talk about it, but note that there are two accidents with military vehicles there.

Time-to-arrival layers indicate the relative effectiveness of EMS services - these statistics do not depend on the population numbers, though probably in rural areas it would take longer for ambulances to arrive. Some states do not report this.

Looking at the distribution of the times of EMS arrival on the scene, there are no obvious problems anywhere - in every state there are a number of cases when it takes two or three hours, but mostly it's under 20 or 30 minutes. But if you turn on the layer of the time to arrival to hospital, the state of New York looks really bad compared to others. This could be a data anomaly, though - it's really strange that sometimes it takes hours to get to a New York hospital, even after accidents that occurred in cities. I would not draw any hasty conclusions from the time-to-arrival data - if some states do not report the times as often as others, the comparisons would not be fair.

Here is the full accident map. Please let me know if you have any comments of questions.

Saturday, October 4, 2008

State percentages of deficient/obsolete bridges

To follow up on the previous post, I wanted to plot some aggregate data on bridge conditions. US DOT provides per-state statistics for the total number of bridges, as well as the percentage of structurally deficient and functionally obsolete bridges.

According to the official explanation, "structurally deficient" in most cases means that deck, superstructure or substructure has a rating of 4 or below. "Functionally obsolete" means that the bridge does not pass the current standards for road width or roadway alignment. An appraisal rating of 3 or below puts bridge in this category.

Here is the map (click here to access it directly)



The default view shows only the structurally deficient bridges. Click on the eye icons in the upper right corner to turn layers on and off.

Northeast is not doing so well on obsolete bridges - DC and Massachussetts are the leaders with 52% and 40%. This could be explained by the sheer number of old bridges, I suppose.

For structurally deficient bridges, Pennsylvania and Oklahoma lead the way with 26 and 25%. Note that to highlight differences, I chose slightly different algorithms for calculating icon sizes on the two layers.