Wednesday, June 30, 2010

Detecting Bad Data

Political numbers geeks learned yesterday that Research 2000, one of the most prolific national political pollsters in recent years, may have been manipulating or even fabricating much of its data. This news comes less than a year after another national pollster, Strategic Vision, was exposed for probable fraud.

The evidence against these pollsters has come mainly from statistical scrutiny of their published results, performed by heroes like Nate Silver and Michael Weissman. But in most cases, you don’t have to be an accomplished sports statistician or a PhD physicist to detect bad data. You just have to care about numbers, and spend some time with them, and use a lot of common sense.

The sad thing is that in America today, hardly anybody cares about numbers except professional scientists and sports enthusiasts. Journalists, in particular, seem to think that their only job is to report both sides of the story--as if there’s no such thing as a fact. Except sports reporters, of course, who have to be extremely careful with facts and figures.

The good news, at the national level, is that the traditional media usually pick up the fraud stories after the bloggers do the actual work. The New York Times wasted no time reporting the Research 2000 accusations on its Caucus blog. If the accusations hold up, we’ll undoubtedly hear more. (Nate Silver will soon be assimilated into the New York Times. Let’s hope these kinds of stories don’t get suppressed in the process.)

Also, at the national level, there’s often enough honest fact-gathering that the frauds don’t make much difference. No single pollster had much impact on Silver’s bottom-line prediction of the outcome of the 2008 presidential election. The danger arises when everyone is relying on a single primary source, like the military or the White House.

At the local level, relying on a single authority is the rule rather than the exception. The Ogden Standard-Examiner almost always prints the word of local government officials as if it were fact, with no questions asked. Despite the detailed exposés on Weber County Forum, the Standard-Examiner has yet to report that the Ogden government manipulated its crime statistics, or that the government’s revenue projections for the Junction development were fraudulently overblown.

In science, fabricating data is the most serious of all crimes. I’ve given failing grades to astronomy students for fabricating their observations (which is usually easy to detect). There are continual allegations of fraud in medical research, where the financial stakes are incredibly high. Fortunately, the list of significant and documented cases of fraud in the physical sciences is extremely short. Although we physical scientists are just as human as everyone else, we know that our peers will tear our work apart if it doesn’t hold up to scrutiny.

No comments:

Post a Comment

Not registered? Just choose "Name/URL" and enter any name you like; you can ignore the URL field.