Flu trends comparisons

October 31, 2009 / Taking a look at seasonal flu data on Google Flu Trends and wondering whether it is reliable.

[two series line graph]
U.S. flu prevalence 2009–2010, as at October 30, vs. 2003–2004 (Image: Google.)

According to data provided on Google Flu Trends comparing to this year to 2003–2004 (a particularly severe year) the rate of estimated flu cases has climbed high and early. As for severity by state, this is what the map looked like in November last year:

[all states are shades of blue]
Estimated USA flu severity map as at November 11, 2008 (Image: Google.)

And here’s what it looks like this year as of the end of October:

[all states are shades of yellow, orange and red]
Estimated USA flu severity map as at October 30, 2009 (Image: Google.)

Finally, here’s a chart showing a comparison of Google estimated prevalence data and CDC prevalence data since the 2003–2004 season:

[two series line graph, 2003–2009]
Yellow: CDC data. Blue: Google data. (Image: Google.)

How valid is this information (i.e. how much is it a reflection of actual rates of influenza infection)? It appears that Google’s flu prevalence estimate data based on search counts (a form of “syndromic surveillance”) are accurate predictors of reported rates of influenza in the United States. The Flu Trends About page (linked above) mentions the Nature study that Google undertook with a CDC researcher, and links to a free, Google-hosted version of the paper as a PDF file if you’re interested. This is further corroborated by the modest paper “More Diseases Tracked by Using Google Trends” that was published in the CDC web journal Emerging Infectious Diseases (Volume 15, Number 8–August 2009) though this is more of a research note than a detailed study.

I did find one interesting statement on this by the CDC, in answer to a direct question from a reporter. At a press briefing on May 5 Alice Park from Time posed the following to the CDC’s Richard Besser:

I just wanted to ask you about your opinion on some of the new flu tracking, surveillance service out there. I know that the CDC has worked with Google flu trends, for example, can you talk a little bit about how helpful that type of information is particularly now to get a better sense of the dynamics of the outbreak, you know, where it might be increasing or what the ebb and flow of it and is it getting worse, is it tapering off, can you comment a little bit about how useful those kind of methods are.

Besser replied:

In terms of our ability to detect emerging infectious disease, new infectious diseases, we’re constantly looking for what we call situational awareness. I mean what’s going on out in the communities. And we’re looking at you know, many, many different sources of information. The Google flu tracking information, there was a study done with Google in conjunction with CDC to look at can you use that information, can you use people going on the web to find information about flu as an indication of where flu is taking place? And the first year looking at that in terms of looking back, it was very helpful. The question is looking forward can you see that? As of two weeks ago, Google hits on flu on H1N1 are just off the charts. And so our website gets 8 million hits a day. So looking for a signal of increased activity on the web in a particular place isn’t very useful. But we’re open and are continually looking at various approaches to early detection because the sooner you can detect a problem, the sooner you can understand it and implement appropriate control measures.

Translation: I’m comfortable saying, with the benefit of hindsight, that Google Flu Trends in its first year of operation has done a pretty good job, but you are not going to hear me endorse it as a predictive tool for tracking H1N1. Or: it’s been shown to be valid, but we don’t yet know how reliable it is from year to year (or, for that matter, how much it might be influenced by special kinds of flu or fears about flu).

So, for what it’s worth, I’m going to go ahead and take it all with a grain of salt for now. I’ll say one thing though: it makes for some interesting graphics, and you have to admit that we’ve had a lot of graphs going down these past 18 months…

Related

The follow-up article for Time by Alice Park, “Is Google Any Help in Tracking an Epidemic?” published on May 6, 2009.

† Be warned: the CDC web site is even slower than your Toyota Yaris.

2 responses

  1. JH Morehouse

    I think the amount of hysteria and hype surrounding H1N1 could significantly skew the results. Maybe I’m getting obtuse here, but it seems foolish at best, and dangerous at worst to count every hit on Influenza, Flu, H1N1, etc. as a case of the flu. It would be like counting everyone who searches “terrorism” as a terrorist.

    October 31st, 2009 at 5:09 pm #

  2. Adrian Cooke

    JH, I don’t think you’re being obtuse, just cautious. The interesting thing is that Google’s graph for the past few years of data seems to overlap quite precisely with the CDC’s. It would seem that regardless of whether people who type “flu” are sick themselves, the one correlates with the other. Whether that will continue to hold in the future, now that the specific scare of H1N1 is upon us, remains to be seen.

    November 3rd, 2009 at 12:03 am #


Zero to One-Eighty contains writing on design, opinion, stories and technology.