Braintique.com header
Left Navigation Bar

The Googleplex Blog: Harold Davis's Blog


March 17, 2006

The Commune and the Scholar

Everybody who uses the web-whether for fun, research, or profit-knows that much of the best content on the web is supplied by the community. This content is created in myriad ways-but is communal, usually not for profit (other than AdSense revenue!), and usually posted with the barest minimum of structure, verification and oversight.

The jeremiads of bloggers rise to the heavens but provide some useful insights.

Profiles on MySpace are the kind of superficial self portraits you'd expect of teens on the make?but can also show wonderful creativity and expressiveness.

Photographs posted to Flickr can be insipid not-quite-in-focus family album affairs-but also can rival or surpass the work of the best professional digital photographers.

Closer to the core of the web, open source software initiatives like Linux and Apache and others hosted by SourceForge provide the technical know-how that keeps the engines turning (and prevents private enterprise from consuming the commune).

Communal forums like LinuxWorld, SlashDot and WebMasterWorld provide the discussion and descriptive glosses that make it possible for all the moving parts of the web's technology to work together.

Taxonomies like the Open Directory Project (ODP) provide structure to search engines like Google and Yahoo. (The ODP is not really communal, but it is noncommercial, provides its data to everyone, and works because of the efforts of volunteer editors.)

Wikis are communal by definition. Commune-based wikis, particularly the Wikipedia, provide information repositories that are unmatched in scope (and in the number of contributors) while avoiding any kind of hierarchical information verification.

Everything, however, is not perfect in this paradise of communes. The major problems with information communes are that they are easy to manipulate or corrupt, and that it is hard to evaluate the reliability of the information contained in communal repositories.

These are not new accusations to hurl at demotic levelers of information barriers. No doubt the priests who could write elegant Latin said much the same kinds of things when Gutenberg produced his first printing press. But they are troubling all the same.

It is easy to manipulate ODP listings and Wikipedia articles to improve natural search engine rankings, and these are standard techniques in the SEO (Search Engine Optimization) toolbox. When the stakes become large enough, anything is corruptible, and there have been serious claims that ODP listings are paid for with bribes (because they can be used to enhance a website's status in search engines such as Google).

More interesting philosophically is the accuracy of information found in communal repositories. A recent Op-Ed piece in the New York Times poses this question, asking how does the accuracy of the Wikipedia compare with the accuracy of the information found in a vetted publication such as the Encyclopedia Britannica?

The answer, of course, is that comparative accuracy depends on many variables.

Before I discuss some of these variables, I need to point out that the wise consumer regards all received information with some skepticism, suspecting that the inherent bias of the purveyor may well color what is presented even if the bias is not intentional. (Here's some more information about how to evaluate the credibility of web pages.)

It's true that the situation is probably more extreme on the web than off it, but information bias is a universal. Even casual researchers need to understand some of the techniques used to evaluate the veracity of information found on and off the web: context, consistency, professionalism of presentation, plausibility, the reputation of the information provider, the verification process (if any), and the apparent motivation of the information provider.

In my opinion, there are manifest instances of information bias in the Big Red Barn and Dr. Seuss-and techniques for evaluating the veracity of information should be taught starting in first grade. Seriously. And it has some bearing on the situation, and is not entirely trivial, that I found the Wikipedia article about Dr. Seuss the best, most objective, and least commercial site to provide a link for more information about this children's book author (above).

Leaving first grade behind, would you rather read an article about elementary physics prepared by 1,000 anonymous members of the hive on Wikipedia, or one written by Albert Einstein for the Encyclopedia Britannica, and scrupulously edited by professionals?

This is kind of an apples and oranges comparison. I sure have more respect for Einstein's intellect than even for, well, a googol of anonymous intellects that have contributed to the hive. Einstein obviously will know his elementary physics, and furthermore will have insights to impart about how to think about physics.

On the other hand, the 1,000 hive contributors will get elementary physics right, too. By the time these folk have been back and forth over each other's work, the finished article is likely to be as accurate as something subject to the most rigorous professional review. There isn't likely to be much bias left after 1,000 people have been over it. There may not be much flavor either.

This highlights an important point: articles in the Wikipedia that cover a topic of fairly wide general interest are likely to be thorough and unbiased. But more narrowly-focused topics are often written by one or two people with an ax to grind, may be biased, and may contain faulty information (or even outright fabrications).

The more technical and arcane you get, the more likely it is that there are only a handful of people who really understand the topic. This line of thinking implies that communal-process information mechanisms like the Wikipedia are less likely to produce good information on cutting-edge scholarly and scientific topics?and more likely to be good sources of information for topics at the general college level (and below).

Going back to Einstein, Einstein is not going to get his facts wrong, and will probably have an interesting viewpoint about physics (even elementary physics). But that self-same "interesting viewpoint" can also be called "bias." In fact, it's common for the very best scientists and researchers to be extremely opinionated in their areas of expertise (and a gosh darn good thing too!).

Jimmy Wales, the founder of Wikipedia, asserts that this issue is not about comparing the accuracy of information derived from a communal process with the accuracy of information from a lone distinguished, professional contributor. Rather, Wales opines, it's about the conflict between two vetting systems. The communal wiki process, according to Wales, involves unending scrutiny whereas a professional review process like that of the Encyclopedia Britannica is flimsy: in the future "people will say, 'This was written by one person? Then looked at by only two or three other people? How can I trust that process?'"

To its great credit, the Wikipedia has been open about problems with accuracy, methodology, fraud, and group process. In fact, these things are discussed ad nauseum as part of the Wikipedia (see the Wikipedia Community Portal for details).

Wales is right that from a philosophic viewpoint that the stakes are high, and that the process of individual signed contribution is on a collision path with anonymous communal information gathering. He's wrong to assume that the commune is always right.

[Thanks to Martin Davis and Phyllis Davis for reviewing this piece; the opinions and flaws, of course, are mine, all mine.]

Posted by Harold Davis at March 17, 2006 10:20 AM

Search Engine Optimization







RSS 2.0 Syndication feed

Syndication Viewer

Our Web host:
IX WebHosting



Food for Your Brain! Get a Barticle! Questions Answered Books for You What We Can Do For You Contact Us Brain Food Questions Answered Books for You What We Can Do For You Frequently Asked Questions About Us Google Research Photoshop Wi-Fi and Wireless Networking The Natural Way to Write