How Big Is a Pig, Er, an Index?
"How big is a pig?" asks a well-known children's story book that ultimately answers the question "...this pig is my mom and she's the biggest of us all!"
In a similar spirit, according to a recent account in the New York Times business section, Google has decided to end a tit-for-tat dispute with Yahoo about which company has indexed more pages. Instead, Google will ask users to guess the size of the Google index. (Google also claims to have an index three times the size of its nearest competitor.)
Obviously, the sheer size of an index is not the only thing that matters in web searching, and maybe not even the most important thing. The relevance and freshness of search results tend to matter much more.
For the record, I've tried Google and Yahoo fairly frequently on the same searches. I slightly prefer Google. It can be said of both search engines that they are amazingly good - except when they are absolutely awful. (The web itself is full of black holes, for example, anything from more than a few years ago.)
The New York Times article notes that Yahoo and Google have been conducting an "arms war" regarding the size of their indexes, and quotes Danny Sullivan of Search Engine Watch who states that there is no objective third-party way to count the size of an index.
This may not be entirely true. Winter Corp., a consulting outfit that specializes in databases, publishes an annual list of the largest databases. Among them: the largest non-commercial database at 222.8 terabytes belongs to the Max Planck Institute for Meteorology, the largest commercial database is Yahoo's at 100 terabytes, and the hardest-working database belongs to UPS and processes more than 1 billion SQL statements an hour.
Posted by Harold Davis at September 28, 2005 02:28 PM