|
Google Archive
An iconoclastic look at Google, research, the Web, the state of the world, and anything at all that interests Harold Davis.
March 27, 2006
Easy Travel to Mars
It can sometimes be problematic finding your way around this turbulent, overcrowded earth. But if you have access to Google, it's now easy to explore Mars.
Google Mars provides elevation maps (showing altitude in relief), satellite photos created using a mosaic of visible light images, and views created with a mosaic of infrared photography. As with Google Earth, you can zoom in and out and navigate across the various views.
Other features are almost too numerous to list. You can browse links that list the regions of Mars, show you where spacecraft have landed, or that are the subject of stories about Mars. As an example of a story, here's the so-called face of the man on Mars. Unfortunately, "story" means a scientific account about a feature, not the wonderful Edgar Rice Burroughs Mars fantasies beloved in my youth, nor even Kim Stanley Robinson's Mars trilogy describing a terra-formed planet.
Using any of the three view modes, you can search the surface of Mars for mountains, canyons, dunes, plains, ridges, and craters. For example, here's the elevation map of the Burroughs crater, named after my hero, pulp fantasy writer Edgar R. He got a crater named after him that is a whopping 78 miles across. Where's Tarzan, or John Carter of Mars, when you need them?
All this is very cool. Very cool indeed. If you have too much time on your hands, go check it out right away.
Putting together these images, which are credited to NASA, the Jet Propulsion Lab (JPL), Goddard Space Flight Center (GSFC), and Arizona State University was undoubtedly—as the About Google Mars page states—a great deal of fun, as well as real work.
But Google Mars inevitably raises the question, what is the point? I'm thinking that Google wants to lock-in its first-mover advantage for local search ads prior to the colonization of the red planet.
Posted by Harold Davis at 2:59 PM
March 15, 2006
As the Manichean Google Worm Turns
I have always been a Google agnostic. I don't love Google, and I don't hate Google. I think Google is a company with good and bad, like most companies, institutions, and human beings.
My picture of most "companies, institutions, and human beings"-and this is very transparent with my kids-is that an angel sits on one shoulder and a devil on the other. Sometimes the angel wins, and sometimes the devil wins.
Google cooperates with the Chinese government to censor Internet access in China. Chalk up one to the devil. On this story, you may laugh or cry when you read about Bill Gates taking the unusual step of praising the competition and Google's censorship in China as preventing more censorship overall. George Orwell, where are you when we need you?
Google resists turning over search records to the U.S. government when its competitors roll over and play dead-and don't give a fig for the privacy of their customers. This time it's three cheers for the Google angel!
On the good side of the ledger, I am very, very impressed with Google's technology prowess and business acumen. Besides personally using Google's search all the time, as the author of two books about using Google's technology, I am professionally grateful to the company. (The books are Google Advertising Tools and Building Research Tools with Google.)
However, I've been bemused for a long time about the free pass Google has got from the technology community (and media) for behavior that would have been critiqued soundly in any other company. (See, for example, my Do no evil? from August 2005.)
Any big business that enjoins on its Investor Relations pages "don't be evil" is riding for a fall. By the way, Google's famous-or infamous-"don't be evil" motto ties in nicely with my devil and angel Google analogy. Both "don't be evil" as a world view and the devil-angel dichotomy are representative of a black-and-white Manichean outlook. Google's more recent Corporate Philosophy statement has hedged "don't be evil" a bit by replacing the notorious too-good-to-be-true aphorism with the declaration that you can make money without doing evil.
Now the Google worm has begun to turn, and the company that could do no wrong can do little right in some quarters. For example, check out Danny Sullivan's 25 Things I Hate About Google on SearchEngineWatch. Danny also loves Google, so I think he buys into the dichotomous Manichean Google worm world view, too.
Posted by Harold Davis at 10:37 AM
March 14, 2006
Power Tends to Corrupt and Google Power Corrupts Abso-Googly
Google the term google and you won't find any ads containing "google" (or "Google"). In fact, "google" is one of the few terms you can search for on Google that produces absolutely no ad results. (A total aside: finding search queries that yield no ads could become another form of Google whacking.)
As you likely know, the results you see when you do a Google search are divided into "natural" links-the supposedly objective links the search engine comes up with in response to your query-and "sponsored" links, which are paid advertisements.
Sponsored links (does an ad by any other name smell fairer?) are clearly labeled as such (in very small type) and appear both above and to the right of the natural search results.
Google makes its very nice living on these sponsored links, a/k/a ads. These ads are placed using the Google AdWords mechanism. (Besides appearing in search results, Google's ads also show up on websites that have enrolled in the Google AdSense program.)
Anyone with a valid credit card can place an AdWords ad-although, of course, Google makes more money from its large advertisers than from its "two-buck chucks."
When you place an ad via AdWords, the system automatically checks the ad text for violations of the Google rules (click here for the lawyerly text of the terms and conditions).
The automated check includes an attempt to weed out copyrighted or trademarked terms.
If your ad is rejected by the software, you can request an exception-but don't expect a response with anything other than a boilerplate reiteration of the rejection.
Google is particularly strict about reserving the term "google". Oddly enough, "AdSense" and "AdWords" are both fair game. Well, perhaps not oddly because Google makes mucho dinaros off selling ads against these terms.
Case in point: a recent attempt by the publisher of my new book Google Advertising Tools to place an AdWords ad using the book's title.
I think this is pretty clearly a legitimate usage of "Google". (The book's copyright page does include the standard notice that Google is a trademark of Google Technology, Inc.)
The ad was rejected, and the reasoned request for an exemption signed by a marketing manager at the publisher was also rejected in boilerplate, reiterative fashion.
Of course, Google has every right to protect its valuable name. However, Google exercises great power over our virtual lives. Sometimes in arbitrary and capricious ways.
It continues to trouble me that this absolute power can be exercised without any effective appellate mechanism.
Posted by Harold Davis at 12:39 PM
January 25, 2006
Google Advertising Tools companion site
Harold's new book, Google Advertising Tools: Cashing in with AdSense, AdWords, and the Google APIs is hot off the press and the companion website is now online with examples, resources, links, source code and more.
The companion site is still under construction, but (which may be important to some readers) the source code and examples from the book have been posted.
The site will contain links to websites created by readers of the book.
Check out this site, and come back again in a bit when it has become "all growned up!"
Posted by Harold Davis at 2:22 PM
December 22, 2005
Google Advertising Tools
The listing for my new book Google Advertising Tools, published by O'Reilly, has just gone up on Amazon. The book will be available very soon.
Google Advertising Tools explains how to optimize sites for high "natural" search engine rankings, and offers a variety of techniques for monetizing content sites. These approaches include targeted affiliate programs and working with CPC and CPM advertising programs such as Google AdSense.
The heart of the book is about Google AdWords. Readers will learn how to cost-effectively use this program to target the audience they require, how to use the AdWords software to their best advantage, understanding the AdWords concepts, terminology, and tools, and how to measure ROI. Readers with a programming orientation will also learn how to use the AdWords API to create custome applications (the examples in the book are in PHP and C#).
My hope in writing this book is to be useful to you: and to continue the process of leveling the playing field so that small business and web site owners can obtain the same advantage from Google AdWords as the "big guys."
So enjoy! And write me at harold AT bearhome.com to let me know what you think.

Posted by Harold Davis at 10:58 AM
November 2, 2005
Copyright in the Era of Flickr and Google
I need to make some preliminary explanations before I get to the heart of this story.
(1) This story is about digital photography -- but the general issues it raises apply equally well to almost any kind of intellectual property that can be represented digitally--meaning music, video, software programs, and more.
(2) I am an active and enthusiastic member of the flickr community. I use my flickr photostream to display my photographs to other flickr members, and to power the image management behind my Photoblog 2.0.
Within the flickr application, you can assign different access permissions to your photographs (essentially, available to the public, only to friends, or only to family). But in order for anyone to view your photos, and in order to use them in my own blog, access has to be set to public. This means that anyone can display my photos on the web, whether or not I've given them permission to do so.
(3) I've changed the name and identifying details of the person using photos from flickr without permission (which is what this story is about). For two reasons: it wouldn't be fair to identify the person (they didn't know I'd blog about it) and I'd also like to pursue my flickr addiction without personal acrimony.
(4) A flickr badge is a group of pictures from flickr that can be displayed on your own website. Flickr generates the code for you, using either HTML or Flash. The pictures must be marked for public access, and can be based around the work of everybody on flickr, a single flickr photographer's set, a Flickr group pool, or using tags (to name the most common way badges are generated). Photos can be set to be random or sequential. This page shows a Flash badge using my photos.
Got all that? OK. Here goes.
I am a member of a variety of group pools on flickr. In a group pool, photographers with interests in common all submit their photos, creating a kind of library.
Recently, I noticed on a fairly prominent blog a flickr badge consisting of random photos from one of the group pools I belong to. The blog author is the flickr administrator of this group pool. I will call him X (and the group in question Y).
I wrote X:
I'm writing to express a little concern about the flickr badge from the Y Group that is displayed on your blog. I assume that you are showing a random selection of photos from the group. While most people would be glad and flattered to have you display their photos (I certainly would), some of the photos in the Y Group are "all rights reserved" (mine, for example - which I accompany with a copyright notice).
So I think as a matter of form and respect, you need to ask permission. Perhaps this could be accomplished by starting a discussion thread on the group (and asking if anyone objects) so it wouldn't be a logistical nightmare. Or, as an opt-in mechansim, you could designate a unique tag for people to use if they want to be included in your display - and create your badge using the tag.
I really don't mean to be a pill here, but I think photo rights are quite important...
X responded as follows:
It took me a while to figure out how I was going to respond to your comments. As a professional photographer and designer I make a living selling my work ... [and] I share your concern over proper use and photographers rights. Having been a long time member of Flickr ... (not to mention many personal sites showing my work) I've seen my work stolen and passed off by others as their own work many times. So many times, in fact, that I do not put the majority of my photos ... on the web. If you value your work, and it sounds like you do, then I don't believe Flickr is the place for you to showcase it properly.
Flickr holds no discretion in who is able to view and use photos posted to groups. This is evident through the site flickrlicio.us which routinely republishes copyrighted material on their site without permission. The Flickr Badge which I (and countless others) use allows you to sample photos from a group or from everyone regardless of copyright status.
Out of respect for your wishes I have changed it to show only my photos I have posted on Flickr. I have, on file, permission from all but a few of the members of the Y Group allowing me to use their photos. For this reason I did not perceive there being a problem. For that I apologize. It was not my intent to offend you.
If the situation with the Flickr Badge continues to be a problem for you I urge you, in my official capacity as admin of the Y Group, to pursue this matter with Flickr (Yahoo!). You also might consider marking your photos as "private only available to family and friends" and setting your download permissions similarly so they are not abused.
Have a nice evening.
I wrote back:
Thank you for your email. I, too, have given your email quite a bit of thought. Where I come out is that I think you missed the point of my original email.
I was not asking you to remove the Y group badge from your site. In fact, I think the variety of photos from the group enhances your site, and that group members would be pleased to have their pictures shown in a badge on your site.
I was asking you to get appropriate permissions, which should not be a hard thing to do (you say that you already have these for most members). For one, I would be happy to extend permission for my photos.
My further suggestion was that you add a discussion thread to the group so that members (and potential members) would know the use you were making of the photos.
I also noted that you could use a special tag to generate a badge, which would allow people to opt-in to your badge display. (A private group by invitation would be yet another possibility.)
The fact that others make use of copyrighted materials without getting permission that you mention doesn't seem very relevant to me. As a general principle, if someone else does something wrong, this doesn't make it right for us to do it. The fact that you are a professional photographer (which I did not realize) should make you even more careful about rights issues.
Regarding your more general comments about flickr and the use I make of it, I am a very enthusiastic member of the flickr community, although I understand some of the drawbacks of widespread image dissemination that you mention. I'd be happy to discuss my uses of flickr, why I do so, and my strategies for dealing with these matters in another email if you'd like.
It's important to me that our discussion not turn acrimonious. As I indicated, I am a reader and fan of your Y blog (and have sent traffic to it via links on my sites). I also like the Y group on flickr. So I think you took my comments the wrong way -- I was suggesting a minor procedural fix to what you were doing, not scrapping the whole thing.
All this raises alot of interesting issues--and they don't have very much to do with flickr. The truth is that it is easy to find images on the web, for example using Google Images.
One way or the other anything you can find and view on the web, you can also copy and use for your own purposes. The only real limitation is that photos displayed on the web are not suitable for high quality reproduction.
Of course, having the ability to do something neither confers the legal right to do so nor makes it OK to do it. I own the rights to my photos, and nobody should be displaying them without my permission (which, by the way, I'm usually pretty happy to give).
Ultimately, there is an inherent conflict between intellectual property lockdown--which means no one gets to see your work--and the desire for dissemination that all intellectual property owners have for practical and emotional reasons. Your intellectal property is only safe if no one sees it, but photos that no one sees do not get appreciated in the marketplace (or otherwise).
By the way, the flickrlicio.us site that X mentions features the "Babes of Flickr"--and is a great deal of fun if you are into this kind of thing.
Posted by Harold Davis at 7:52 PM
September 28, 2005
How Big Is a Pig, Er, an Index?
"How big is a pig?" asks a well-known children's story book that ultimately answers the question "...this pig is my mom and she's the biggest of us all!"
In a similar spirit, according to a recent account in the New York Times business section, Google has decided to end a tit-for-tat dispute with Yahoo about which company has indexed more pages. Instead, Google will ask users to guess the size of the Google index. (Google also claims to have an index three times the size of its nearest competitor.)
Obviously, the sheer size of an index is not the only thing that matters in web searching, and maybe not even the most important thing. The relevance and freshness of search results tend to matter much more.
For the record, I've tried Google and Yahoo fairly frequently on the same searches. I slightly prefer Google. It can be said of both search engines that they are amazingly good - except when they are absolutely awful. (The web itself is full of black holes, for example, anything from more than a few years ago.)
The New York Times article notes that Yahoo and Google have been conducting an "arms war" regarding the size of their indexes, and quotes Danny Sullivan of Search Engine Watch who states that there is no objective third-party way to count the size of an index.
This may not be entirely true. Winter Corp., a consulting outfit that specializes in databases, publishes an annual list of the largest databases. Among them: the largest non-commercial database at 222.8 terabytes belongs to the Max Planck Institute for Meteorology, the largest commercial database is Yahoo's at 100 terabytes, and the hardest-working database belongs to UPS and processes more than 1 billion SQL statements an hour.
Posted by Harold Davis at 2:28 PM
September 27, 2005
Book Review: Winning Results with Google AdWords
New on my bookshelf is search engine marketing guru Andrew Goodman's Winning Results with Google AdWords from McGraw-Hill Osborne.
Several years in the writing, publication of this book has been considerably delayed. (Of course, as I know only too well, writing about any Google program is trying to hit a quickly moving target!)
Author Goodman runs Page Zero Media, a Toronto outfit that specializes in AdWords and pay-per-click (PPC) consulting engagements. True to the "write about what you know" adage, this book focuses pretty narrowly on search advertising with Google AdWords.
The viewpoint is that of a professional advertising campaign manager who urges clients (and readers) that "you must advertise consistently and confidently."
The book explains the size of the paid search market, the mechanics of working with AdWords, keyword selection strategies, and conversion tracking. There is good material on writing winning ads, increasing conversion rates, and the future of online targeting.
There are costs and benefits to this book's relatively narrow focus, but I found it a compelling read. If you are managing, planning, or thinking about an AdWords campaign it should be in your library.
Posted by Harold Davis at 11:11 AM
September 20, 2005
Google Bombs at War
If you don't know the term, here's a definition of google bomb:
Google bomb: to hyperlink a term to a website in order to raise the linked site's ranking in a search return result set on Google.
I've intentionally written an opaque (but accurate) definition of Google bombing. It's easy to understand Google bombing in practice.
The most notorious current example of Google bombing - in fact, of Google bombs at war - relates to a search for the term failure. President George W. Bush's official White House biography comes up first, although film maker Michael Moore's official site is a close second. These sites have been Google bombed and connected to the word failure in web pages.
The Google PageRank algorithm encourages this kind of "voting" by webmasters - both a strength and weakness of the methodology. (Click here for more of my take on the PageRank algorithm.)
I've already voted twice in this piece for George W. Bush as the failure of the two, and now let me stack the votes some more: failure, failure, failure!
Posted by Harold Davis at 11:11 AM
September 7, 2005
Google Then and Now
Today marks the seventh anniversary of Google, which was officially founded on September 7, 1998.
Cruising the Wayback machine for Google is instructive. The first search page in the archives for the company is from December 2, 1998. The graphics are a little cruder, but the basic pattern and interface is in place.
On the earliest Google company info page in the Wayback machine (from 1999), you'll find these tidbits:
"Google Inc. is not at present a publicly traded company, and we are currently unable to speculate on whether or when our privately-held status might change." [Comment: ...but I bet they already had a plan...]
"10^100 (a gigantic number) is a googol, but we liked the spelling "Google" better. We picked the name "Google" because our goal is to make huge quantities of information available to everyone." [Comment: ...but not, apparently, all information, such as that about Google execs...Click here for a googol FAQ]
It's fun to see how far Google has come in such a short time, and fun to speculate about how far Google may go in the next seven years!
Posted by Harold Davis at 11:59 AM
August 30, 2005
Privacy: Not
Staff writer Elinor Mills wrote a balanced article about personal privacy in the time of Internet search for online news aggregator CNET. The facet of personal privacy that the article primarily discussed was the ability to keep personal information out of Internet search results. Since Google has the bulk of the Internet search market, Google was the primary focus, but as the article noted, overall, "the issues with Google are not any different from the issues you have with Yahoo, Microsoft and others."
Mills began her article showing some of the results of thirty minutes of googling Google's CEO Eric Schmidt. She came up with an idea of his net worth (North of $1.5 Billion), the amount of Google stock he'd sold (at least $140 million), a political affair he'd attended with his wife (a $10,000 per-plate fund raiser for Al Gore), where he lives (affluent Atherton, California), and some of his hobbies (attending the Burning Man festival, flying a private plane).
All this sounds like a pretty good life to me, and information that is not all that sensitive. It's not as if it were social security numbers, children's names and schools, or things of that sort.
If I were offered the job, the loss of privacy that this kind of revelation entails would be a small price. Actually, it was already revealed - in a simple Google search. Mills merely repeated it.
If you or I had nothing worse than these facts about Eric Schmidt to be revealed, we might not fear loss of control of our personal information very much. (Personally, I monkey with algorithms, race fast cars and women, and am wanted in twenty countries, but that's a different story!)
However, Google's rather foolish reaction was fierce and in the tradition of lesse majesty. The head of Google's public relations department, David Krane, called the editor in chief at CNET to complain, and then announced that all CNET reporters were banned from talking to Google for a year (actually, this is Google's loss more than CNET's). Krane later told the New York Times that he wasn't authorized to discuss the matter at all. Here's the New York Times account of the flap.
What conclusions should be drawn from the affair? I've already noted that Google's do no evil motto is by definition an empty mantra when applied to an aggressive public corporation.
It's also pretty clear that whether you are a prince or a pauper it is most likely that a great deal of information can be found out about you using online research tools. Indeed, this was part of Ms. Mills's reason for using Eric Schmidt as her case-in-point for her privacy article.
Is this loss of control over one's personal information a good or bad thing? It's both - and there's no hiding from the fact that eggregious information dispersal about people is a fact of modern life. This is not going away, in fact there is only going to more information availability as time goes by. Indeed, this is the premise that Google's business rests upon.
Savvy citizens of the Internet recognize the widespread availability of personal information as an opportunity (although, of course, one can't ignore the potential downsides). They use the opportunity to present themselves the way they want to be seen. They also know that those they work or socialize with can't really expect to keep secrets - not always a bad thing.
When the privacy double-edged sword fell close to home, Google's CEO Schmidt failed the basic test: that of understanding that information, in the old cliche, wants to be free, and that the same rules apply to Google insiders as to the rest of us.
Posted by Harold Davis at 10:40 AM
August 19, 2005
Google's Secondary Offering: Dollars and Sense
Google has filed for a secondary offering of 14,159,265 shares (the number is an expansion of Pi following the initial 3 and decimal point) amid speculation of planned acquisitions or business expansion.
Actually, there probably is nothing grandiose in the works along the aquisitions or new businesses line. Or if there is, it isn't relevant to this offering. (Google would have done it anyhow.)
The people running Google are very smart (whatever else they may be, and cute would certainly seem to apply considering the number of shares in the offering).
This secondary offering is simply a matter of good business sense. If I had a company as richly valued as Google, I'd happily cash a bit out - and exchange equity for dollars and cents.
Posted by Harold Davis at 10:55 AM
August 1, 2005
Do no evil?
In its IPO Prospectus, Google Inc. famously (and, I think, fatuously) promised to "do no evil." Now here's a news account of an employment discrimination lawsuit based on events right around the time of the IPO.
Christina Elwell was a national sales director at Google. I'm not sure whether it is "a national sales director" or "the national sales director".
She told her boss, Timothy Armstrong, Google's vice president of national sales, about her high-risk pregnancy. She was subsequently demoted and harassed, according to the allegations. Ultimately she lost three of the four fetuses she was carrying.
Just as no one except the parties to a marriage gone bad know the true facts leading up to a divorce, no one except the parties to an employment termination know what really happened. But it seems at least reasonably clear from the lawsuit that Google did nothing to make Christina's life easier during this incredibly stressful part of her life.
This seems more or less situation normal for corporate America. The question in my mind is: why does Google get a free pass when - in contrast to their protestations about doing no evil - they behave just as badly as most other big businesses?
Posted by Harold Davis at 11:03 AM
July 20, 2005
Google Moon

In honor of the first manned Moon landing, which took place on July 20, 1969, Google Moon is now open and showing NASA imagery. Google Moon uses the Google Maps interface to help you take a close up and personal tour of the moon.
Here's the Google Moon FAQ containing an unusual announcement of Google's future plans: "We usually don’t announce future products in advance, but in this case, yes, we can confirm that on July 20th, 2069, in honor of the 100th anniversary of mankind’s first manned lunar landing, Google will fully integrate Google Local search capabilities into Google Moon, which will allow our users to quickly find lunar business addresses, numbers and hours of operation, among other valuable forms of Moon-oriented local information."
Zoom in too close, and what do you think you see? (Closer, that is, than the resolution provided by the NASA maps which are the underlying data for NASA Moon). Proof that the moon *is* really made of cheese!
Posted by Harold Davis at 8:50 AM
July 17, 2005
Google AdWords API web services
If you want to build and application with the Google AdWords API, you should know that the AdWords API services have the same relationship to each other as the objects you can manipulate from a regular AdWords account. So get to know AdWords before you try to program with the APIs.
Most importantly, an AdWords campaign contains ad groups which contain keywords and creatives. To modify keywords associated with an ad, you’ll need to start with the Campaign Service (using the client information for the account the campaign is part of), and down to an ad group via the AdGroup Service, and from there to the Keyword Service.
Here’s some more information about the purpose and role of each of the AdWords APIs services, and links to each service’s WSDL file:
Account Service
The Account Service lets you create and modify information associated with AdWords accounts, such as billing information. WSDL file
AdGroup Service
The AdGroup Service lets you create ad groups, list ad groups, associate ad groups with a campaign, and perform actions. For example, you can set the cost-per-click for all keywords in the ad group. WSDL file
Campaign Service
The Campaign Service lets you create, list, and modify campaigns. For example, you can change the name, set the daily budget, and define the end date of a campaign. This service also lets you perform actions on a campaign, such as pausing the campaign. WSDL file
Creative Service
The Creative Service lets you create and modify creatives, and associate them with an ad group. WSDL file
Info Service
The Info Service lets you get basic information about how much you have used the AdWords API and how many operations you have left. WSDL file
Keyword Service
The Keyword Service lets you get information about keywords. For example, you can get the keywords in an ad group, and create and modify keywords. WSDL file
Report Service
The Report Service lets you generate reports on the performance of your AdWords campaigns. For example, you can get reports on the daily number of impressions, clicks, and clickthrough rate. WSDL file
Traffic Estimator Service
The Traffic Estimator Service lets you estimate the performance of keywords, ad groups, and campaigns. You can estimate data, such as the cost-per-click, clickthrough rate, and average position of your ads. WSDL file
Posted by Harold Davis at 10:11 AM
July 13, 2005
Grokking AdWords Conversion Tracking
"Conversion Tracking" sounds arcane, but really it is a simple mechanism you can use to determine if your Google AdWords campaigns are producing the results you want.
How does the value of a conversion relate to the return on investment of a Google ad campaign? This is pretty straightforward Business School 101. If you understand what a conversion is worth to you, and the percentage of CPC (cost per click) AdWords visitors who do convert (the conversion ratio), then it is easy to calculate your return on investment (ROI) for an AdWords campaign. If the amount each conversion is worth multiplied times the conversion ratio is greater than your average CPC, then your AdWords campaign is producing a positive ROI—and probably makes sense.
You could put this as an equation. For an AdWords campaign to make sense, then the following should be true:
Conversion amount * Conversion Ratio > Average CPC
Google's underlying conversion-tracking mechanism bears a striking resemblence to the way Google AdSense works (AdSense is the program used to put Google contextual ads on your and my web sites):
• You add some special Google conversion tracking code to a results page on your site.
• You make sure that the results page will be opened when a visitor is converted, for example, by buying something (in the case of a purchase, the results page usually doubles as an order confirmation).
• When a user clicks your AdWords ad, Google adds a cookie to the user’s computer to track the user.
• When a user with the Google AdWords cookie on their computer opens the results page, a conversion is logged, and a special tracking message displayed to the user.
An interesting, and somewhat controversial, feature of Google AdWords conversion tracking is that as part of the tracking, Google notifies users that they are being tracked. This notification is produced by the Google-supplied code you add to the results page. A tracked user sees a message titled Google Site Stats with a “send feedback” link when the results page is opened.
Google explains that they prefer to be above board about their actions, and that the send feedback link is chance for users to understand Google’s privacy policies, and indeed to reject the Google tracking cookie if they wish.
However, most major advertising programs do provide conversion tracking options, and other advertising programs that track users and conversions do not “brand” the process. Users who click through ads in these other programs never know they are being tracked.
To summarize, Google tracks users coming through AdWords to your site by giving them a cookie. You decide when a conversion has occured by opening a page for your visitor (for example, to confirm an order - but the choice is yours!). When the two match (the cookie and the confirmation page) a conversion is recorded and reported in AdWords.
Cross-channel conversion tracking is a nifty feature within AdWords that also allows you to do conversion traffic coming in to your web properties from other advertising networks such as Overture. Taking advantage of this feature, if you are deploying ads across multiple advertising venures, means that you can use the powerful AdWords reporting facilities to aggregate your information about conversions in one place.
Posted by Harold Davis at 9:25 AM
July 4, 2005
Independence Day Google

Posted by Harold Davis at 9:40 AM
June 24, 2005
AdSense Site Search Box
As you may know, Google's AdSense program has two major parts: Content and Search.
When one thinks AdSense, one probably thinks content. These are the text ads that appear on sites. But AdSense Search is important, and a real revenue opportunity for content sites as well. With AdSense Search, you put a Google search box on your site. When visitors search using the box, ads are displayed on the search results page. If a visitor clicks through one of the ads, the publisher who displayed the search box makes some money.
There are a couple of neat extensions to this that Google has come up with as part of the AdSense program. You can configure the search box to search sites as well as the web (up to three domains). The caveat here is that you can only search that which Google has indexed, so the site search may not be as good as a true keyword based site search mechanism.
Also, Google recognizes that the search result return page is in some ways part of the publisher's site, even though it is actually served by Google. As a publisher in the AdSense program, you have the opportunity to choose your own graphics scheme, and add your own logo that clicks back to your site.
By way of example, you can check out the Google site search box I configured for Braintique (note that if you search for something, the results page will have a logo you can click to return to Braintique) on the bottom of this page or on the Braintique home page.
Posted by Harold Davis at 9:14 AM
June 15, 2005
Google and the George W. Bush Fart Doll
Brad Hill in the Google Unofficial Weblog has brought to my attention a story that is circulating that Google exhibits a pro-Clinton (and, by inference, anti-Bush and pro-left-wing) bias.
The story is that Google's AdWords unit declined to run contextual ads for a book bashing Clinton published by World Ahead Publishing. The information comes from a press release put out by World Ahead, a "premier publisher of conservative and Libertarian books" based in Los Angeles.
Now, I've found that AdWords does accept or reject ads in a capricious and silly fashion (this is presumably a function of inadequate software, not political bias), and that there is no effective appeal from an AdWords rejection. That said, some of the claims in the World Ahead "Google Censors Ads for Anti-Clinton Book" press release appear to be false, to wit:
- Google has accepted ads for a George W. Bush Fart Doll. This does not appear to be the case. A Google search for George W. Bush Fart Doll yields Google results pages with plenty of Bush dolls in the AdWords ads, but no flatulence (at least in the ads).
- Claims of "political bias" and "liberal leanings" in ad acceptance policies also are false, in my experience. I've previosuly noted the creationist Google ads that consistently show up on my blog items blasting intelligent design. And my wife's site about successfully managing high-risk pregnancies was getting ads for anti-abortionists until we took steps to ban these organizations (by listing their URLs with Google) from the site. To me, these ads seem to show a conservative bias - and a true contextual analysis would not have placed them on our sites. (Note to readers of whatever political persuasion: if you see a Google ad for an organization, cause or politician you truly hate, by all means click the link. Each time an AdWords link is clicked, it costs the advertiser!)
Admittedly, there are some anomolies. For example, the first search return link in Google for the miserable failure is famously the official White House biography of George W. Bush. (If you enter "miserable failure" in the Google search box and click the I'm Feeling Lucky button, this is the page that will open.)
Personally, I don't disagree with the equation of President Bush with miserable failure (to wear my politics on my sleeve). However, the result appears to have to do with Google bombing rather than intentional bias on the part of Google. As such, it speaks to the automated contextual analysis and relevance ordering algorithms Google uses hitting a wall - particularly when confronted by perpetual efforts to game the system - rather than intentional bias.
Releted entries: Publish the PageRank Algorithm, Humans Tweak Google Rankings
Posted by Harold Davis at 10:34 AM
Addendum to 3-D Mapping @ Google
I didn't make it clear in my story on Google's ground level mapping campaign and truck here and in my ORA blog that Google already has 3-D mapping based on satellite photos (although you likely know this if you've been reading my blog).
To use this feature, go to Google Maps. Enter an address (or don't enter an address - you can do it after you've started 3-D satellite mapping). Click on the Satellite link shown to the far right of the screen capture below (the capture shows a somewhat blurry view roughly centered on my home in Berkeley). Simple. And cool.

Posted by Harold Davis at 9:54 AM
June 14, 2005
Google 3D Mapping Truck Coming Soon to a City Near You!
According to SiliconValleyWatcher, Google is planning to use trucks equipped with lasers and digital mapping software to create realistic 3-D maps from the ground. There's already an experimental truck cruising San Francisco, which is running into some problems with line of sight measurments due to pedestrians and vehicles.
Apparently, second and subsequent passes by the trucks through the city could eliminate erroneous data due to moving objects. But Google is looking for a way to 3-D map a city with a single pass.
I've been wondering for a while about the arms race into very cool mapping software - wonderful stuff to play with (interesting anomolies and all), but without a clear path to monetization (at least to me). A glimmer of where this is going is beginning to dawn on me: a very sci-fi world of realistic virtual mapping of local information truly would be a yellow pages killer, and Google has the resources and smarts to maybe pull this off.
Posted by Harold Davis at 10:36 AM
June 12, 2005
More on Google Human Tweaking
There's considerably more commentary on Henk Van Ess's blog about the truth of his disclosures that the Goolge Eval lab exists, whether or not Google Eval Labs are actually used to impact search results, and the ethics of Van Ess's disclosure of this NDA protected material.
For background on this interesting story, see my earlier posts to the Googleplex Blog and on O'Reilly.
Posted by Harold Davis at 10:48 AM
June 8, 2005
Frank Lloyd Wright Google

Posted by Harold Davis at 9:19 AM
June 2, 2005
Humans Tweak Google Rankings
I've long believed that Google's ranking of responses to search rankings--the famous PageRank algorithm, now with more than 100 variables--is manipulated by human editors working for Google under an algorithmic facade. (See related posts on the Googleplex Blog and in my O'Reilly blog).
Now, there's some hard evidence that this is true. Dutch investigative reporter and search expert Henk Von Ess blogs about what he calls Google's Secret Evaluation Lab.
The real name for this secret part of Google is Rater Hub Google. It's staffed, mostly on a temp basis, mostly from international universities. Google calls these hires "international agents" or "quality raters." Here's a help wanted ad for the position from Monster.com.
Quality raters apparently spend their time checking search results, deprecating spam, moving the best results to the top of the search result stack, and (possibly) testing experimental Google features. This sounds like a kind of fun job!
Seriously, it isn't really surprising that Google has found the need to inject human editors into the equation. My objection is to the false pretence that Google's results derive from some purely formulaic (and supposedly objective) measure (likened in my previous posts to the Wizard of Oz hiding behind a screen while he makes a show for Dorothy and the others).
The Henk Van Ess blog item is really worth checking out. He promises more information to come. By all means review the Flash presentation on his site that shows some of the Rater Hub Google software - very interesting indeed!
Posted by Harold Davis at 3:13 PM
May 24, 2005
Google in the Enterprise
Google's consumer initiatives have been getting a great deal of play lately - for example, mapping and the new customizable portal, er, start page. In the long run, Google's efforts aimed at the enterprise may be more interesting - and have more impact on the fate of Google the company than these consumer moves.
Google as a business and institution is now like a shark: it must proceed forward or die. There's too much critical mass created by the mile-high stock price and all the way-smart hirees at Google for things to just bumble along. This is the line of thought that inevitably leads to an assault on that IT Everest - the enterprise. You can see the process at work in a long, slow fashion at Microsoft. From its hobby operating system roots, the company is now a beaurocratic octopus engaged with the IT enterprise, pushing .Net and Longorn, having forsaken its Mom and Pop developer roots.
In a recent eWeek article, Matt Glotzbach, Google's Enterprise product manager describes the new, and free, Google Enterprise Desktop Search for the Enterprise as a unified way to search information sources including email, and instant messaging. But this product lacks the ability to index network drives, and therefore is supposedly not competitive with Google's enterprise search appliances.
These applicances range from the Google Mini, which sells for $3,000 to the Google Search Appliance, which is a $30,000 black box. (Here's some recent coverage in Information Week.) The key indexing logic in these appliances is a closely guarded secret, hence my use of the term "black box."
True, the Google enterprise appliances seem easy to deploy. In some cases, if companies want an easy way to enable search of generic kinds of documents - either for customers or employees - they may just slap in one of the Google widgets.
But wearing my enterprise consultant hat, I know there's some pretty tough competition in enterprise information analysis and data retrieval software from companies like Autonomy, IBM, Microsoft, and Verity. Specialized information areas - for example, medical and legal, have their own highly technical semantic requirements for retrieval, and the Google appliances can't even begin to touch them. So these Google enterprise appliances are actually kind of mid-market: they don't touch the functionality of the higher-end (and more expensive) solutions, and they don't understand the semantic rules and requirements of areas requiring subject-matter expertise. Compared to systems that do take a stab at solving these problems, they are cheap to buy and easy to deploy. But a lot more expensive than the free Enterprise Desktop Search product that I described at the beginning of this piece.
How much market is there in the enterprise for this mid-ground? I'm guessing not enough to support the shark in its forward motion. To achieve critical mass in enterprise search, Google will have to develop expertise, tools, and techniques to master the syntax and semantics of specific domains - and do it better than anyone else.
Posted by Harold Davis at 9:56 AM
May 20, 2005
Google Maps Captures UFO
Here it is: hovering somewhere over Florida!
Posted by Harold Davis at 2:40 PM
Google Customizable Home Page Pathetic?
Brad Hill doesn't like the new, customizable Google home page features that I discussed earlier.
Brad says: "I stand by my initial impression, which is a combination of bewilderment and distress. The product is painfully immature, even childish by industry standards, and on my machines does not work properly. (The personalization is not persistent unless I type in a special ”/ig” suffix to the google.com URL, and the “toggle” between the personalized and traditional pages is not a toggle at all, but a one-way switch away from the customized page.)"
The features work for me, though I see what Brad means about the toggle. I agree that the whole thing is a little light weight, but just wait! More features are coming. But what?
Posted by Harold Davis at 11:33 AM
Google Customizable Home Page
You can now customize your Google home page (the program is in Google "beta" within the Google Labs).
I like what you can do, but this is still only portal-lite. Features are more or less limited: you can add headlines from Google News, Slashdot, the New York Times, and the BBC, add links to your Gmail account, and move these things around on your custom page in a whiz-bang fashion. (I should also mention stock quotes, local weather which is not yet operable, movie reviews, and more.) The whole thing can be toggled on and off - the off position is referred to as "Classic Google"!
As I said, I think the ability to customize your Google home page is pretty cool. Why not be able to use all that white space on the Google search page to provide some quick info? It lets me learn some things that are important to me at a glance everytime I open my Google home page (which obviously I do quite a bit).
The new interface is minimalist in the Google UI spirit. But the mainstream media, such as the New York Times (Google Moves to Challenge Web Portals), are portraying the move as a step towards the portalization of Google, and a salvo in the "war" with Yahoo! and MSN.
I see things a bit differently. I already use Google as a kind of portal, meaning that it is my home page (and the Google Toolbar sits always ready for use where ever I surf). I like Google's laser-focused functionality and uncluttered look, and I'm not particularly eager to have Google start offering a la Yahoo everything from soup to nuts to hot dates.
The new customizations strike me as an OK but minor tweak to the interface rather than a paradigm shift toward portal-dom. Where I think it is going is that undoubtedly Google will be adding multiple syndication feeds as viewing options to this page. What a great way to generate more AdSense revenue! Now I understand why Google is putting ads in RSS and Atom feeds.
Posted by Harold Davis at 7:54 AM
May 19, 2005
Mapping Crime in Chicago
Adrian Holovaty and Wilson Miner have put togther an application that visually shows crime incidents in Chicago using Google maps.
Check it out. It's notable for a number of reasons:
It "munges" together application logic and data from a number of sources (e.g., Google Maps, and a Chicago Police Department database).
By allowing users to visually see the geographic place different "incidents" took place, it presents information usefully in a new and useful way. For example, you can visually see what blocks arson, rape, or armed robbery takes place in (good for avoiding these areas?).
The application allows the information it presents to be sliced and diced using syndication: RSS feeds are available that show crimes for each police beat, and each city block.
Way cool!
Posted by Harold Davis at 9:25 AM
May 16, 2005
Tagging and community
Here's a great post by Adam Bosworth, Google's CTO, about tagging and community, which are essentially broadly phrased Web 2.0 issues...
Posted by Harold Davis at 3:01 PM
May 8, 2005
Googly Mother's Day

Happy Mother's Day: for Mothers, Fathers, Kids, and even Google!
Posted by Harold Davis at 8:44 AM
May 3, 2005
Google National Teacher Day

May 3 is National Teacher Day, celebrated with a special Google logo...
Posted by Harold Davis at 10:20 AM
May 2, 2005
Potemkin Villages and the Holy Grail of Local Search
In his column today in the Wall Street Journal, Lee Gomes tells the story of looking for reputable home repair contractors using the Web. (I'm not supplying a link to the WSJ article because the WSJ is a pay-only site.)
Gomes bemoans the sorry state of search results, full of what he calls second-generation spam. With a particularly apt metaphor, he notes that "Folk with an historical bent might refer to [this kind of search spam site] as Potemkin Web sites because they are all facades."
Potemkin villages were the fake exteriors of happy peasant settlements erected at the behest of a Count Potemkin to impress the Empress Catherine on her visit to the Crimea. Behind the happy facades were emptiness, squalor, and misery. The term "Potemkin Village" has come to mean a misleading facade, usually erected by a politician, with the intent to deceive casual visitors.
For commentary related to search engine spam, see my early Weblog entries Is Google Painting Itself into a Corner? and Does Google Play Fair?
According to an article in the Economist, this year in a watershed event, Google and Yahoo's advertising revenue will probably surpass the advertising revenue obtained by the big three broadcast networks (ABC, CBS, and NBC). In this watershed year, it's appropriate to have a look at that holy grail of advertising, local search.
Local search has long been the feifdom of the yellow pages, off-line and to a very limited extent online. The secondary players in local search are local newspapers - people turn to their classified ads to buy cars, rent apartments, search for jobs, and (to a lesser extent) locate restaurants and remodeling contractors.
Today's E-Commerce Report in the New York Times reports that local search institutions, such as newspapers and yellow page vendors (including BellSouth, SBC, and Verizon's Superpages.com) are turning themselves into agents for Google and Yahoo. For these local search vendors, this is a dance with the devil. Continuing down this path will only show local advertisers the future: online search engine advertising.
Going back to the subject of Potemkin villages, these are sites that seem like a guide to a topic, but are actually useless content surrounded by a sea of ads. Gomes, in his WSJ column, opines that a large part of the problem is that someone can make money from advertising just by putting up some pseudo-content site, a roofing contractor site, that gets lots of traffic.
I think the problem is a bit more insidious, because many of these Potemkin sites are intentionally intended to boost search engine rankings, much more than to generate ad revenue. Despite drawing considerable flak (see comments at the bottom of the page), I stand by my opinion that a partial fix for this would be to open the PageRank algorithm for the sanitizing effect of public inspection (obviously, this is a controversial stance).
I also think that the holy grail of local search is Google and Yahoo's to lose, that the biggest single threat is search engine spam, and I agree with Lee Gomes that an important part of the solution is human oversight of search engine placement. According to Gomes, even today "insiders say that Google and the rest use human editors a lot more than they let on." Pay no attention to the person inside that black box!
Posted by Harold Davis at 10:05 AM
May 1, 2005
A Pet Peeve and More about RSS Ads
My pet peeve is any business (or person) who calls their audience (customers) thieves. Eggregious sinners in this respect: movie studios who run advertorials in theaters urging people not to copy movies (the viewers of this propoganda are people who paid for movie tickets).
What brings this pet peeve to mind is that Jason Calacanis, quoted in my Web entry yesterday as happy to be able to put Google contextual ads in his syndication feed, wrote "it also means is that people who have been stealing our content are now going to be stealing it with advertisements in it..." In other words, his audience (customers) are thieves. Jason, get a life!
Yesterday's suggestion that ads in RSS and Atom feeds got a fair amount of flak when I posted the entry in my O'Reilly blog as my O'Reilly weblogs seem to (are there hordes of people reading ORA looking for something to disagree with?) This well-reasoned side email gives the gist of the disagreement with me over the ads-in-syndication issue:
"Why are RSS ads any different than other ads?
"I took a survey a little while ago on my site and found that people overwhelmingly wanted full text feeds. I could do that, and do, but the lack of ads means I have 0 chance of any revenue from these visitors?
"So either a) eat any potential compensation or b) remove full text feeds and give just partial feeds.
"Now there's a "C"
"Why is this bad and in need of filtering?"
There's some logic here. If one publishes a valuable feed, it is natural to want to make money off it. I agree that feeds that only publish headlines with links are irritating (although sometimes they have utility as a quick way to browse things). I somewhat disagree about the need to publish full excerpts because a partial excerpt can give enough information so a subscriber can see if they want to link to the full content. I also think that even a full item entry can inspire readers to link to the content behind the entry, to get context, related items, discussion, and more.
Apart from the wish to make money from one's content (which is natural), the reason I think that RSS ads are different from other ads (the question posed above) is because RSS and Atom feeds are XML marked with the function of the elements, not with formatting. This is part of what gives syndication its power: knowing only what kind of thing an element of a feed is, subscribers are free to render (or use software that renders) feeds in anyway that seems good to them. The Google contextual ads violate this by placing content (their ads) within HTML table tags (along with other formatting).
If ads within syndication feeds were displayed within their own <ad>... </ad> XML tags, I'd have no problem with their inclusion in RSS and Atom: and subscribers and subscriber software could decide how to handle them.
Posted by Harold Davis at 9:21 AM
April 30, 2005
Google AdSense in RSS and Atom
In a contextual sea change, Google has announced a beta program that allows publishers to embed AdSense contextual ads in RSS and Atom syndication feeds.
"This is gonna be huge… like HUGE HUGE HUGE!!!" gloats Jason Calacanis, the President of Weblogs, Inc., which - as you might expect - publishes a bunch of syndication feeds. As a publisher of syndication feeds in a small way, I suppose I ought also to be glad. But actually, I think monetizing RSS and Atom feeds in this way in part defeats the purpose of having a feed. Feeds are simply an information stream that point to further information. If they get cluttered, they cease to be useful, and subscribers will cancel. In some sense, the RSS or Atom feed is an advertisement for the full content in and of itself.
Case in point: it is against the policies of Hot Feeds and Syndication Viewer to display feeds that carry ads.
Here's the way Weblog's unofficial Apple feed, which Calacanis is using to test the syndication AdSense program, looks (with ads) in Syndication Viewer. Each Google AdSense ad is simply an HTML table embedded in feed items like this [identifying numbers and actual link omitted]:
|
]]>
The good news: it ought to be trivial to parse these ads out of incoming feeds, simply by eliminating table tags and their contents from item entries if in no other way. I will certainly do so in Syndication Viewer.
Posted by Harold Davis at 9:49 AM
April 26, 2005
Microsoft is IBM and Google is Microsoft
When I opined over on my O'Reilly blog that Microsoft was the bloated (but rich) IBM of our day, and that smart, nimble Google would overtake Microsoft, I got a lot of flak. For example:
"Is this a troll? IBM remains a much larger company than Microsoft, with 96 billion USD in revenue last year compared to MS's 38 billion. Google had *3* billion in revenue in the same period, but with a price-to-earnings ratio of 137(!), compared to IBM's 15 and Microsoft's 26, whose stock would you buy?
IBM and MS create software for sale. What exactly does Google sell, besides advertising?..."
Now Fortune Magazine comes along with a cover article ("Search and Destroy: Why Google Scares Bill Gates") that more or less says what I said...
Posted by Harold Davis at 1:46 PM
Changes to Google's AdSense
The changes to Google's offerings to advertisers that I wrote about yesterday are of course interesting to publishers who make money with AdSense.
The impact is that Google is offering to advertisers (besides the ability to target specific sites) a new way to pay: by impression (CPM) rather than per-click-through (CPC). The AdWords program will somehow balance the two payment schemes to find the one that pays the most.
From a publisher's viewpoint, it's nice to receive payment every time an ad is displayed on one's site (rather than when the ad is clicked). This is what CPM means. How much these CPM ads will filter down to smaller publishers remains to be seen: to the extent that they do, it becomes another monetization option for publishers with worthwhile content. (And also, one doesn't have to worry about click fraud with CPM!)
Posted by Harold Davis at 1:36 PM
April 25, 2005
Google Announces Site Ad Placement
In a further departure from its roots in searching, Google has announced a new program that will allow advertisers to choose sites for target ads.
I've written in the past about Google's transformation (at least looking at revenue) from a search company to an advertising broker. But contextual advertising - Google's other-than-search bread-and-butter - still involves technology that automatically caluclates relevancy, just like a searching algorithm, and produces a marketplace for words. Whether the context is evaluated correctly or not by the automated mechanism is another story.
In the new Google order of things, advertisers interested in branding can pick their sites without regard for contextual relevancy. The New York Times bills the changes as a move away from search for Google, and Brad Hill in his blog calls the move "industry shaking."
Advertisers will pay for the new-style ads on a CPM basis, or per ad impression (not per ad click as with contextual ads), although the process of purchasing these ads will be blended with the traditional Google CPC (pay per click) word auction process.
These ads are intended to appeal to big advertisers who are looking for general branding (for example, all kinds of advertisers of luxury goods would probably like to appear on BMW's site, even if the ads were not contextually relevant to cars).
Context-free ads may also work for advertisers who are better able to determine relevance than the automated algorithms - it makes sense to put ads for cheese on a oenophile site, but AdSense probably doesn't think so. Google's revenue stream will be a winner, as will big advertisers and owners of desirable Web content. Possible losers: anybody but Google in the business of brokering ads.
Posted by Harold Davis at 11:34 AM
April 22, 2005
Statistically Improbable Phrases (SIP)
Statistically Improbable Phrases (a/k/a "SIP") is the improbable term Amazon.com uses as a search ranking technique. Here's Amazon's explanation.
In more-or-less plain English, here's how this works. Amazon indexes the "Search Inside" content of the books in its catalog (that is, the books in which publishers provide this content). In many cases, Amazon provides a list of SIPs on the main listing page for the title. For example, Starting an Online Business for Dummies by Greg Holden has a number of linked SIPs listed, including "your online business." These SIPs are phrases that appear with anomalous frequency in the inside content of the cataloged book compared with the entire the rate of occurence of the SIP in the universe of books in general. This statistic over-occurence implies that the SIP is a significant representation of the content of the book.
By clicking one of the SIP links, you get other books in which the SIP occurs, sorted from most to least by the number of SIP references. For example, "Web Analytics" and "E-Commerce for Dummies" have the next highest occurences of the SIP "your online business" after "Starting an Online Business for Dummies."
This is a different and somewhat appealing way to use Amazon's search facilities to find books in which the author uses distinctive phrases. Longer run, the concept has an elegant simplicity (as did the original PageRank algorithm), and may be useful for automated tagging and ranking of content.
Click here for a lively discussion of SIPs in the context of author as phrase maker, and here's a fun discussion and list of adult SIPs on Amazon (over 18 only please click this link).
Posted by Harold Davis at 11:17 AM
French Military Victories
The top result for the Google search French Military Victories is a parody page (also called a "Google bomb"). You can get directly to this Google-look-alike page that comically ignores the Battles of Hastings, Agincourt, Castillon, Hohenlinden (and so on) by entering the "French Military Victories" phrase in the Google search box and clicking the I Feel Lucky button. (The parody page on the Albinoblacksheep domain that looks like it is part of Google is, of course, the most popular search result for the term.)
Erica Sadun, in an O'Reilly Weblog, has a round-up of Google Easter Eggs and other Google anomolies.
Posted by Harold Davis at 8:20 AM
Earth Day Google

Happy Earth Day!
Posted by Harold Davis at 7:58 AM
April 21, 2005
Tracking Your Search History with Google
Google has a new feature that tracks your search history. (Click the link to open the sign-up page for the application, which otherwise can be accessed through Google Labs.) This is another one of Google's wonderful tools that is a "beta" that is not really a beta.
So far, the functionality is pretty straightforward and (at least for me) very useful. When you are logged in, and you can log in of course from any computer, Google keeps track of your searches. You can click on any of the links that represent a saved search to see the full text of a search. You can also retrieve searches by date using the calendar that the Search History Tool provides.
Once you sign up for the Search History Tool, your Google home page changes. Up on the right-hand top, you'll see your sign-in email, a link that takes you to your account history (which is where to find the calendar and search links, and also the ability to remove any or all search items), a link that takes you to your Google account settings, and a link to sign out. If you do sign out, Google's home page will show you a sign-in link.
Keeping track of my search history is a very useful feature for me. I can't tell you how many Google searches I do a day (probably in the three or four digits), although the Seach History Tool will in fact tell me this. Many times, I've "lost" information from a search that I thought I didn't need (but actually did!) The Search History Tool will pretty much solve this problem for me, I think.
Down the road, the Search History Tool will probably let Google refine searches for me based on my search history (it remains to be seen how helpful this is).
The Search History Tool may allow customization that is an important weapon in the battle against search spam, because I may be able to "train" my future searches by deploying a "Junk" setting against my Search History results. Other forms of search customization, once I'm logged in to search, are also possible of course.
I also see the Search History Tool as a Trojan horse for the introduction of more Yahoo-like services. Google needs to know its users better to create these services: and what better way to know someone than to keep track of their searches?
Posted by Harold Davis at 12:10 PM
April 20, 2005
Responding to the Response to My Response
A blogger took exception to my post about opening the PageRank algorithm in my O'Reilly blog. I responded to the response here. The blogger responded to my response, and I made the following response to the response to the response (confused yet?). (This was originally posted as a comment to the other blog, but I thought it worth adding here as well.)
As far as I am concerned, this is a much more reasoned comment than your first one (although I still wish you would sign your name as I don't want to spend the time looking you up, and I don't want to refer to someone I am having a dialog with as "il minore" whatever).
The primary thing you said I said that I didn't (and that in fact I don't believe) is that everything should be open. I do not believe this, and never said I did. Some things should, and some things shouldn't -- although I think Linux is a case in point of something that has clearly benefited from being open.
It's both a blessing and a curse to see both sides of an issue. The reason for the "polarity" of my position is, of course, I see the problems with any kind of disclosure of PageRank. Bearing in mind these problems, and the unlikeliness of it ever being disclosed, here are the reasons I think at least some more community discourse regarding the precise nature of PageRank would be helpful:
(1) PageRank, based on my searches, is not working as well as it used to. My impression is that the rate of deterioration is increasing. So it is not the case of "if it isn't broken, don't fix it." Rather, it is this isn't working, and Google is playing catchup to try to make it work, kludging together something with 100 variables (!). The elegant simplicity of the PageRank concept has clearly been lost.
(2) The time delay built into newer iterations of the Google model really bugs me. I like my information fresh! And as someone who is frequently putting up web sites, I like to be able to get them picked up fast without resorting to chicanery myself.
(3) In fact, Google is the predominant way people find information on the web. Anyone who thinks this is not very important to people, politics, and life is naive. And, Google itself is more of a community effort than may be apparent. Case in point: Google uses the community-run Open Directory Project for major taxonomic information.
(4) It's bad when Microsoft is heavy-handed and secretive, but OK when Google is? Come on, Googlers may be the good guys, but let's hold them to the same standards as everyone else.
(5) No, I do not believe Google has hired all the smart people with something to contribute to search. What baloney! Sometimes the best ideas in fact do come from outside the box.
Posted by Harold Davis at 10:10 AM
April 19, 2005
A Squeeling [sic] Lunatic and the PageRank Algorithm
An item (Delusions of Community) in the otherwise apparently unattributed Pensieri di un lunatico minore ("Thoughts of a minor lunatic") blog attacks my blog entry Publish the PageRank Algorithm Now! for things I said, things I didn't say, and also contributes ad hominem personal attacks on me to this discussion.
My original entry was reposted in my O'Reilly blog.
Here's the opening salvo from the minor lunatic: "squeels Harold Davis on an otherwise reputable O’Reilly site." [Thanks - at least I know how to spell "squeals."] Before I get to some substantive issues, here's the concluding attack: Also, for someone who so admires the “feedback” from open source, he [Harold] doesn’t have comments or trackbacks turned on on his own blog.
Trackbacks and comments are something I've thought about a great deal. I have comments and trackbacks turned off on my blog for two reasons. One is administrative: I simply don't want to deal with the spam that results from keeping them open. (It's my experience with the other blogs that I administer that do allow comments and trackbacks that this is a real problem.) More important, from a conceptual viewpoint, my blog is my blog: I use it to express my opinions. I am not a public utility, and I have no delusions of grandeur that I am comparable in any way to Google (alas!). My blog is not a community forum, it is a bully pulpit, and anyone who wants to comment can do so in their own blog, as the minor lunatic did so trenchantly. (And, at least, I sign my own name to my opinions.)
In my original post I clearly noted "It's probably unreasonable to expect Google to publish how PageRank really works in light of competition from other search engines, and the efforts of SEO Webmasters to game the system." I'm in fact deeply disturbed by the gaming of the system that is going on. I think it is leading to increasingly bad search results. In other words, the search spammers are winning. This is too bad. Is the answer more of the same old same old -- which isn't working? (And keep it a deep, dark secret at the same time?) I think not. In the long run, it is effectively possible to reverse engineer PageRank empirically if in no other way because the results are so self-evident.
I mention all this because the minor lunatic notes (supposedly from friends inside Google) that keeping everything secret is the only way to stay ahead of the search spammers. Since this isn't cryptography, he says, where secrecy means bad design, it is "instead a situation where secrecy is the only option..."
Rhetorically, he asks "Does Mr. Davis really believe everything benefits from a million unexpert eyes? Google has managed to hire just about everyone in the field, so the likelihood of someone solving some huge problem in their equation is pretty small. How many of the big math problems get solved by random people, and they’ve been published for hundreds of years?
...
Google doesn’t stand to benefit one whit from Joe Random Programmer looking at a 100 variable equation. Most programmers suck at understanding equations to start with, otherwise you wouldn’t see so many hair-brained bad ideas that would have been solved with the opening of a volume of Knuth.
Google has become a magnet for the best and brightest in many many fields." [I've omitted some of the argument.]
Well, no, I don't believe that everything benefits from openness. (I never said I did.) I just believe that the mechanisms behind forces that have a huge impact on our lives should be transparent. We should be able to verify the results of elections, and we should understand (at least roughly) how Internet search orders its results. (If you get the idea that I'd dearly love to review the real PageRank algorithm, you are right.) Keeping things secret will not foil the spamsters. I distrust authority even when it is as benign as Google, and I am always mindful of Lord Acton's dictum about absolute power corrupting absolutely.
Yes, I agree with the minor lunatic that I'd rather have smart people who are naturally interested in a field working on it, than everybody under the sun regardless of proclivity or talent. That said, it is my absolute conviction that more transparency, and more community involvement, would benefit the formulation of Google's search algorithm.
Posted by Harold Davis at 3:37 PM
April 18, 2005
Maps and Satellite Photos @ Google
If you haven't tried it, the relatively new mapping capabilities at Google are very cool. I like the maps. You enter an address (or portion of one). The user interface is very sparse, with a widget in the upper left to control zooming in and out and panning across a map. Like Mapquest, you can get driving directions to or from an address. Unlike Mapquest, there are no annoying ads, pop-ups, and other distractions. You can use the Google maps to find businesses or services of a specific type in a given locale.
For reasons I can't quite put my finger on, I think the Mapquest maps may actually be a little better for navigating by car than the Google maps. But one feature of the Google mapping application is, in fact, cool beyond belief. If you click the Satellite button on the upper right hand corner of the screen, you can see the aerial, satellite photographic view of any map. The zooming and panning tools work with these satellite pictures.
Start with where you live from above and pinpoint your block and rooftop. You can zoom in and out, see your whole city or state. Kids love this.
Some fine print: Google maps and sat photos are limited to the United States and Canada (more world coverage is promised soon). Coverage in rural areas can be spotty. This, however, corresponds rather well to the areas that are not much sought after (click here for a Google engineer's visualization of frequency of search by locale).
More fine print: the photos seem somewhat dated (for example, big elm trees can be seen in the aerial view of my house, they came down in winter storms over two years ago). There are the usual sporadic reported glitches in the maps (this is not unique to Google's maps).
Here's a neat application that combines Google Maps and Craig's List so you can view the location of Craig's List real estate listings.
Posted by Harold Davis at 9:56 AM
April 15, 2005
Leonardo Google

Leonardo Da Vinci Google
Posted by Harold Davis at 7:02 AM
April 11, 2005
Delay in Ranking a Feature, Not Bug
According to a recent article in SitePro News, an online publication aimed at Webmasters who want to optimize their sites, Google's delay in ranking sites, and the delay in according credit to inbound links, is a feature, not a bug.
I've written critically about the longer and longer wait times for sites to get indexed as a problem (see Is Google Painting Itself into a Corner?). Now, according Lawrence Deon, an SEO (Search Engine Optimization) expert and the author of the article titled "Surviving Google's Aging Delay" in SitePro News, it turns out that Google does it on purpose as part of the arms race with those gaming the system. The "probationary period" makes sure that there are no instant returns for "manufacturing" tons of links, but it also makes it harder for newcomers to break into the system. In addition (this may or may not be an unintended side effect) it makes it more worthwhile than it used to be to purchase AdWords slots from Google to draw traffic, at least during the probationary period.
The idea that Google probably intends to slow indexing and ranking down, and that this is (in some ways) to Google's financial benefit, makes me call again for publication of the details of the PageRank algorithm. (See Publish the PageRank Algorithm.) Let the antiseptic of open scrutiny and discussion work its magic on this matter that is so important to the Web!
Posted by Harold Davis at 3:29 PM
April 10, 2005
Google National Library Week

Posted by Harold Davis at 9:32 AM
April 6, 2005
Some Obvious Truths about Click Fraud
If you don't know what click fraud is, here's a quick definition I just coined: clicking on contextual ads with no interest in purchasing the goods or services advertised and the intention of defrauding the advertiser or enriching the contextual publisher.
Click fraud is a major problem for contextual advertising vendors on the Internet like Google and Yahoo. An article about click fraud made the front of today's Wall Street Journal (link not supplied because WSJ is a pay-only site).
In addition, a somewhat under-reported lawsuit has been filed by Arkansas retailer Lanes Gifts (and others) against Google (and others) alleging systematic overpayment due to click fraud.
So I think it's time to state some - for me - obvious truths about click fraud. To wit, obviously:
- There is some click fraud (nobody really knows how much right now)
- Advertising brokers (Google, Yahoo, etc.) are taking counter measures (they can probably detect click fraud on a statistically large scale)
- Getting an adequate refund and explanation if you've been a victim of click fraud is not very likely (but then as a publisher on the Web it's unrealistic to expect good customer service from any of the sources of advertising revenue, at least in my experience)
- In the aggregate, contextual advertising does work and delivers targeted prospects much more effectively than any other method (if contextual ads didn't work, advertisiers wouldn't be paying the big bucks for them)
So where does this leave us? Advertisers and others need to accept that click fraud will always be with us. It should be understood that the Googles of this world will do their best to keep click fraud to reasonable levels, but that there will always be a fudge factor added to the number of "real" clicks. Get used to it and get over it!
Related links:
Contextual Advertising: Not
Foiling the Click Fraudsters
Words for Sale
Click Detective
Posted by Harold Davis at 9:13 AM
April 5, 2005
Published: Building Research Tools with Google for Dummies
A real thrill: to step out on the porch and bring inside the carton with newly printed copies of my book, Building Research Tools with Google for Dummies. It's finally in print! Yippeee! (I think it looks great, but then again I would! Go out and get a copy and see for yourself if it doesn't help with your Google research conundrums...)
Related Links:
Building Research Tools with Google for Dummies companion Web site
Building Research Tools with Google for Dummies on Amazon
Posted by Harold Davis at 11:25 AM
April 3, 2005
Fired Blogger Formulates Blogging Policy
In an exciting example of the Peter principle in action - rising to the level of one's own incompetence - Mark Jen a/k/a ninetyninezeros, who was fired for his blogging by Google, is now helping to formulate the blogging policy at his new employer, Plaxo.
Here's the syndication feed for his new blog on Syndication Viewer, and (for those of you interested in the back story) the feed for the blog that got him fired. He's no longer updating the old blog, and the new blog is a decidedly more corporate affair, including proper use of lower and upper cases.
Posted by Harold Davis at 2:14 PM
April 1, 2005
The Search of Tomorrow Meets the Chef of Today
A recent article in Information Week in a series about the future of software is called Search For Tomorrow.
As you'd probably expect, the article is about the state of searching software, but more about searching software today than in the future. I'm reminded that science fiction, whenever it is nominally set, is actually about the time it is written in. I'm also vaguely reminded of the classic Ralph Kramden routine from the Honeymooners in which as part of his get rich scheme of the moment he plays the Chef of the Future.
So what's the word from the Chef, er, Search of the Future? According to the Information Week article, it's big business: not only for contextual search-related advertising (who would have thunk it?), but also because of the enterprise need to search "unstructured" information. Federal Homeland Security is pumping r&d funding into the field. Microsoft is investing in basic search research.
According to the article, current search innovations tend to fall into the following areas:
- Tweaking search-results ranking algorithms (e.g., Google with more than 100 variables to calculate PageRank)
- Combining a web-style search with complementary, simultaneous other search sources, such as an encylopedia (Microsoft)
- Using current in-progress open documents as contextual hints when a search is engaged (Autonomy)
- Tagging unstructured data (text, email, audio, video) with meta-data (Autonomy, IBM)
- Better understanding the semantics of search requests (IBM)
- Creating a unified architecture for managing unstructured and structured data (UIMA from IBM)
The Information Week piece concludes: "The [search] tools we use now work pretty well. But more esoteric ones employed by just a handful of people today [the Chefs of the Future?!] could portend better approaches to come."
I find myself perfectly OK with complex query syntax (one of the supposed bugaboos of users in the article) but quite disappointed with search today. As a researcher, I have a reputation of being able to find anything, and usually I can although it may take a while. But I'd like to see search tools much more along the lines of the Star Trek computer: "Computer, show me an overlay grid of this, that and the other..." and whambo, the information appears correlated and easy to appreciate, so I can see where the enemy vessel is hiding.
To get there, obviously we need searching software that is much more intelligent, and capable of better contextual and syntactical analysis. But until we get something close to the Star Trek model, it is ridiculous to speak of the future of search being here now.
Posted by Harold Davis at 9:11 AM
Have a Drink of Google Gulp!
Have you had a drink of the Kool-aid, er, Google Gulp, yet?

Posted by Harold Davis at 8:03 AM
March 31, 2005
Publish the PageRank Algorithm Now!
Google Enterprise general manager David Girouard is quoted in a recent Information Week article as saying that Google's PageRank algorithm uses more than 100 variables in its calculations.
Google's PageRank algorithm is used for the all-important determination of how a search results are ordered. In other words, the higher the PageRank, the more likely you are to find a page using Google. Most people display Google search results ten per page. Studies have shown that there is a huge difference in the number of click-throughs you get if your result is one of the first three top-ranked pages, and also that there is close to 100% fall-off in click throughs after three pages (or thirty) search results. This helps to spell out the importance of PageRank and its gate-keeping function towards the information available on the Web.
If it is true that more than 100 variables are used to calculate a given Web page's PageRank, then PageRank has come along way from the rather simple mechanism published by Brin and Page in their graduate student papers, and used by Google in the early days.
In the proto-PageRank system published by Brin and Page, a page's PageRank is a fraction calculated recursively by summing the PageRanks of the pages that link to it, and applying a simple damping factor representing how likely it is for anyone to surf away from a given page. In this theoretical Web universe, the sum of all PageRanks is always 1. Here's some material from Building Research Tools with Google for Dummies about how Google works.
It's amusing to note that the term "PageRank" was probably coined to reflect Larry Page's role as the creator of the concept rather than because it is about ranking pages.
There is something deeply troubling about the complex and opaque nature of the 100+ variable unpublished PageRank algorithm as it stands today. In effect, this means that nobody (except Google insiders) understands how information in this most important of information portals passes the gate keepers.
It's probably unreasonable to expect Google to publish how PageRank really works in light of competition from other search engines, and the efforts of SEO Webmasters to game the system. But not publishing the details of the PageRank algorithm goes against the tenets of open source espoused by many who work at Google, violates the idea that information should be freely available (after all, this is a most important piece of meta information!), and deprives Google of the open-source-like benefits of community scrutiny.
So I say, free the PageRank algorithm now!
Posted by Harold Davis at 9:14 AM
March 30, 2005
Van Gogh Google

Posted by Harold Davis at 8:35 AM
March 27, 2005
Contextual Advertising: Not
An extremely important part of Google's business is famously built upon contextual advertising: Advertisers bid on keywords using Google's AdWords software, and the winners have ads placed "contextually" on web sites whose publishers have elected to affiliate with Google using Google's AdSense software.
But "contextually" is a significant misnomer. Computers are very good at literally matching keywords, but very bad at catching the subtle nuances of context.
As a Web publisher, you find offensive ads placed by Google. For example, on Phyllis's HighRisk.org, a site devoted to helping parents with preemies and high-risk pregnancy conditions, we get ads for thinly disguised anti-abortionists. You can deal with this one by blocking the domains in question (Google allows AdSense publishers to block up to 200 domains as "competitors.")
It's a little harder to deal with what turned up when I wrote a blog entry blasting intelligent design as a euphemism for creationism. Both the blog entry and my Main blog page for the month kept gettings ads from anti-evolutionists too numerous to block by domain.
Similarly, but a little funnier, when I wrote a blog entry commenting on a business press item comparing Google to Wal-Mart, and coming down hard on Wal-Mart, and another item just blasting Wal-Mart, both my blog items and my monthly page started getting inundated with ads urging readers to shop Wal-Mart.
Further up the black humor scale, today's blog entry comparing Terri Schiavo's fate unfavorably with being buried brain dead and coated in honey in a red ant heap draws lots of AdSense ads for ant pest control services.
Obviously, these examples are not isolated to my Web content, and are replicated millions of times over across the Web. Obviously, some "contextual" ads do work: people do click on them and end up buying goods or services. (Advertisers can measure the success rates and are not fools.)
Still, the very term "contextual" gives one hope for better, more intelligent, placements that are truly context sensitive. And, as a publisher, these stupid ads make me feel like running out and telling the world: click those ads for creationism, Wal-Mart, and ant control and cost those foolish advertisers some bucks!
Posted by Harold Davis at 4:56 PM
March 22, 2005
Google Code
Google Code is a new Google site that will post open source projects developed by Googlers as well as material related to the various Google APIs.
Why the open source aspect of this hybrid (APIs and open source) resource? According to the FAQ, "We really care about free and open source software (F/OSS) at Google, and this site is one aspect of that affection."
Seems to me that the F/OSS stuff was probably workable as an external Google project and site when combined with the Google-specific APIs (after all, Google is not an eleemosynary institution).
I'll be taking a close look at the open source projects on Code Google and will report back. In the meantime, you may be interested to know that the Code Google site publishes two syndication feeds: one showing exemplary projects created with Google APIs or tools, the other an update feed to Code Google (both shown here in Syndication Viewer).
Posted by Harold Davis at 8:52 AM
All the Google APIs in One Place
Here's a neat page that has links to all the various Google APIs in one place.
The Google APIs are Web services used to build custom applications using methods provided by Google. By now there are a lot of different Google APIs (some, of course, are more interesting and/or important than others), including:
AdWords (the subject of my next book)
Blogger
Deskbar
Desktop Search
Froogle
Gmail
Google Groups (Did you know that each Group has a related syndication feed?)
Keyhole
Web Search (the subject of Building Research Tools with Google for Dummies)
Posted by Harold Davis at 8:37 AM
World Water Day Google
World Water Day Google:

Posted by Harold Davis at 7:33 AM
March 21, 2005
Google UI Designer Does Her Laundry
She does the wash on company time, gets to show her undies to Colin Powell, and lives to blog about it!
Posted by Harold Davis at 4:09 PM
March 19, 2005
Google Is a Verb
The Oxford American Dictionary (OAD), one of the "big five" American dictionaries, will officially annoint "google" as a verb when its new edition is released next month.
I google, you google, we google, oh to google in the spring!
Companies typically resist having their name turned into "real" words beginning with a lower-case letter. When Xerox transmogrifies to the verb to xerox, and Google becomes google, there are negative implications for trademark protection of the word. But this is the kind of problem we all should have!
Related links: New York Times article mentioning the inclusion of "to google" in the upcoming OAD, and noting generational changes in the land of lexicons (the editors are young and use Google to help decide if a word is ready)
Building Research Tools with Google ("Google" is used as a lower-case verb in the first chapter)
Posted by Harold Davis at 8:27 AM
March 18, 2005
Is Google Painting Itself into a Corner?
Search has become vastly less important to Google financially than its role as an advertising broker (see my blog entry about this). But search is still crucial to Google's ambitions to become the information portal to the world. I'm afraid that Google's search is facing three very serious problems, and the problems are only getting worse.
Before I get to the problems, two caveats. Google is still my favorite search engine, and I use it all the time, even though there are reams of other options. Google employs legions of very smart people, many of whom probably spend a lot of time thinking about the problems I am bringing up. They may have thought of some answers that I haven't.
The biggest problems with Google's search, as I see it, are:
(1) Spam search results. These range from paid placement advertorials (which may actually have a bit of decent information) intended to direct surfers to a specific merchant to absolutely heinous junk.
(2) Flaws in the PageRank algorithm, which cause the rich (popular sites) to get richer, but make it hard for newer sites (even those with quality content) to get ranked high enough to draw any traffic.
(3) Longer and longer waits before sites and pages are added to the index. This wait time has become as long as two or three months in some cases. The wait for cross-corellation using incoming linkage to assign a rank can be even longer. This creates a static index, inappropriate for a medium as dynamic as the Web.
Of these problems, only the third, long wait times for indexing, seems solvable to me with a scalable technologic solution. (My thinking is that if you throw enough processing power at it, and engage many parallel spider bots, you could probably reduce the wait.)
With the spam and PageRank issues, Google has partially become the victim of its own success. It's so important to get good placement that it is worth thinking up any number of clever tricks to get there, or even to invent spurious content just to improve search placement. Google and the SEO webmasters are engaged in a furious arms race surrounding these techniques, and Google is losing, resulting in the arteriosclerotic condition of your search results.
I don't think that there is any good solution short of hiring human editors to evaluate content. When Google starts hiring people to categorize and evaluate Web pages you'll know they agree with me, and have thrown in the towel on finding a scalable high-tech solution.
Related link: Building Reseach Tools with Google companion web site
Posted by Harold Davis at 9:00 AM
March 17, 2005
Leprechaun Google

Posted by Harold Davis at 8:45 AM
March 15, 2005
GoogleX
GoogleX is a cool, new user Google interface made using Javascript and DHTML, new from Google labs. Run your mouse over those groovy icons above the search box! Here's more about it from its creator.
Posted by Harold Davis at 9:26 PM | |