Wednesday, August 15, 2007

I'm In Ur Wikipedia, Tracking Their Edits

One of Wikipedia's greatest strengths--the open-editing format that (nearly always) permits ordinary citizens to add, subtract, or alter content--is also one of its largest liabilities, since persons with less-than-honorable intentions can manipulate data for any number of nefarious reasons: discrediting a competing company, say, or spreading disinformation. At least, they can and will until someone else notices and re-edits an entry. And while Wiki is undeniably useful for writers and researchers of every stripe, the very fact that its insta-data is so easily manipulable by anyone and everyone should serve as a whisper in the ear--if not a huge and wildly undulating red flag--that it might be a good idea to double-check the content against another source, that it would be prudent to ask oneself, before quoting a Wiki entry at length, I wonder whose fingerprints are on this stuff, anyway?

Cal Tech graduate student Virgil Griffith had that very thought. And the computation and neural-systems academic decided that not only was it time to figure out who was behind all the edits, but that it would also be a boon to the free and open marketplace of ideas to offer the general public a way to know, too.

And thus Wikipedia Scanner was born. From Wired Magazine:

On November 17th, 2005, an anonymous Wikipedia user deleted 15 paragraphs from an article on e-voting machine-vendor Diebold, excising an entire section critical of the company's machines. While anonymous, such changes typically leave behind digital fingerprints offering hints about the contributor, such as the location of the computer used to make the edits.

In this case, the changes came from an IP address reserved for the corporate offices of Diebold itself. And it is far from an isolated case. A new data-mining service launched Monday traces millions of Wikipedia entries to their corporate sources, and for the first time puts comprehensive data behind longstanding suspicions of manipulation, which until now have surfaced only piecemeal in investigations of specific allegations.

Wikipedia Scanner -- the brainchild of Cal Tech computation and neural-systems graduate student Virgil Griffith -- offers users a searchable database that ties millions of anonymous Wikipedia edits to organizations where those edits apparently originated, by cross-referencing the edits with data on who owns the associated block of internet IP addresses.

Inspired by news last year that Congress members' offices had been editing their own entries, Griffith says he got curious, and wanted to know whether big companies and other organizations were doing things in a similarly self-interested vein.

"Everything's better if you do it on a huge scale, and automate it," he says with a grin.

This database is possible thanks to a combination of Wikipedia policies and (mostly) publicly available information.

The online encyclopedia allows anyone to make edits, but keeps detailed logs of all these changes. Users who are logged in are tracked only by their user name, but anonymous changes leave a public record of their IP address.


Wired invites readers who've used Wikipedia Scanner and unearthed any companies or government spooks fiddling around with data or rewriting history to submit their finds--and vote on other readers' discoveries--at the magazine's blog.

What's also brilliant about Griffith's brainchild is that it injects a much-needed dose of accountability into the sprawling corpus indicium that is Wikipedia. Corporations and politicians seeking to shape (or outright change) information won't be able to hide their self-interested edits behind anonymous user names, and, one hopes, knowing that their IP addresses now point Wikipedia Scanner to their identities will deter them from making mercenary, dishonest, and unscrupulous edits in the first place. One always hopes.

In any case, it's refreshing to learn of an all-too-infrequent case of youth and reason overcoming wealth and tyranny. Cheers to you, Mr. Griffith.

(H/T Lisa in Baltimore)

UPDATE: Wikipedia Scanner is already embarassing some government agencies, tying numerous unethical edits to computers at the CIA and FBI:

WASHINGTON (Reuters) - People using CIA and FBI computers have edited entries in the online encyclopedia Wikipedia on topics including the Iraq war and the Guantanamo prison, according to a new tracing program.

The changes may violate Wikipedia's conflict-of-interest guidelines, a spokeswoman for the site said on Thursday.

No comments:

Post a Comment