Braintique.com header
Left Navigation Bar

The Googleplex Blog: Harold Davis's Blog


March 31, 2005

Publish the PageRank Algorithm Now!

Google Enterprise general manager David Girouard is quoted in a recent Information Week article as saying that Google's PageRank algorithm uses more than 100 variables in its calculations.

Google's PageRank algorithm is used for the all-important determination of how a search results are ordered. In other words, the higher the PageRank, the more likely you are to find a page using Google. Most people display Google search results ten per page. Studies have shown that there is a huge difference in the number of click-throughs you get if your result is one of the first three top-ranked pages, and also that there is close to 100% fall-off in click throughs after three pages (or thirty) search results. This helps to spell out the importance of PageRank and its gate-keeping function towards the information available on the Web.

If it is true that more than 100 variables are used to calculate a given Web page's PageRank, then PageRank has come along way from the rather simple mechanism published by Brin and Page in their graduate student papers, and used by Google in the early days.

In the proto-PageRank system published by Brin and Page, a page's PageRank is a fraction calculated recursively by summing the PageRanks of the pages that link to it, and applying a simple damping factor representing how likely it is for anyone to surf away from a given page. In this theoretical Web universe, the sum of all PageRanks is always 1. Here's some material from Building Research Tools with Google for Dummies about how Google works.

It's amusing to note that the term "PageRank" was probably coined to reflect Larry Page's role as the creator of the concept rather than because it is about ranking pages.

There is something deeply troubling about the complex and opaque nature of the 100+ variable unpublished PageRank algorithm as it stands today. In effect, this means that nobody (except Google insiders) understands how information in this most important of information portals passes the gate keepers.

It's probably unreasonable to expect Google to publish how PageRank really works in light of competition from other search engines, and the efforts of SEO Webmasters to game the system. But not publishing the details of the PageRank algorithm goes against the tenets of open source espoused by many who work at Google, violates the idea that information should be freely available (after all, this is a most important piece of meta information!), and deprives Google of the open-source-like benefits of community scrutiny.

So I say, free the PageRank algorithm now!

Posted by Harold Davis at 9:14 AM

March 30, 2005

Who put those ducks adrift...

...on a sea so wide?

ducks in tub


Julian had me take the original photo of ducks in his tub, which was composited with a sky to use as a technique illustration for my forthcoming Digital Photography Digital Field Guide.

Posted by Harold Davis at 8:39 AM

Van Gogh Google

Van Gogh Google

Posted by Harold Davis at 8:35 AM

March 28, 2005

Down to the Sea in Boats!

Julian and I went out to Point Reyes on Saturday! We had fun hiking the Tomales Point trail. The wild flowers were marvelous. Then we went down to McClure Beach, where Julian had fun in the tide pools and I made sure he didn't get swept out to sea (the waves were pretty wild, and the open Pacific stretches straight to Japan). Julian and I hiked back up to the car wet, sandy, tired, and happy - with some really nice shells.

On the way out, we also stopped by this wrecked tug boat parked in a mud bank off Inverness. It's at about a 45 degree angle and easy to climb aboard if you don't mind getting a bit wet, so we had a great time exploring it.
Pt. Reyes tug boat

Posted by Harold Davis at 3:15 PM

March 27, 2005

Contextual Advertising: Not

An extremely important part of Google's business is famously built upon contextual advertising: Advertisers bid on keywords using Google's AdWords software, and the winners have ads placed "contextually" on web sites whose publishers have elected to affiliate with Google using Google's AdSense software.

But "contextually" is a significant misnomer. Computers are very good at literally matching keywords, but very bad at catching the subtle nuances of context.

As a Web publisher, you find offensive ads placed by Google. For example, on Phyllis's HighRisk.org, a site devoted to helping parents with preemies and high-risk pregnancy conditions, we get ads for thinly disguised anti-abortionists. You can deal with this one by blocking the domains in question (Google allows AdSense publishers to block up to 200 domains as "competitors.")

It's a little harder to deal with what turned up when I wrote a blog entry blasting intelligent design as a euphemism for creationism. Both the blog entry and my Main blog page for the month kept gettings ads from anti-evolutionists too numerous to block by domain.

Similarly, but a little funnier, when I wrote a blog entry commenting on a business press item comparing Google to Wal-Mart, and coming down hard on Wal-Mart, and another item just blasting Wal-Mart, both my blog items and my monthly page started getting inundated with ads urging readers to shop Wal-Mart.

Further up the black humor scale, today's blog entry comparing Terri Schiavo's fate unfavorably with being buried brain dead and coated in honey in a red ant heap draws lots of AdSense ads for ant pest control services.

Obviously, these examples are not isolated to my Web content, and are replicated millions of times over across the Web. Obviously, some "contextual" ads do work: people do click on them and end up buying goods or services. (Advertisers can measure the success rates and are not fools.)

Still, the very term "contextual" gives one hope for better, more intelligent, placements that are truly context sensitive. And, as a publisher, these stupid ads make me feel like running out and telling the world: click those ads for creationism, Wal-Mart, and ant control and cost those foolish advertisers some bucks!

Posted by Harold Davis at 4:56 PM

When I Am Brain Dead...

... my dearest, do not weep for me! Instead:

Hang my witless body in a cage in the town square for birds to peck and gnaw.

Coat it with honey, and bury the technically alive cadaver that was the palace of my soul in a red ants' nest, and let the ants do the rest.

Warehouse my empty body in nursing homes and worse for fifteen years. Start me starving to death, then stop the process, the start it again. Let my awful parents and my slightly creepy spouse fight publicly over me. Let my empty husk of a body become a poster for the worst impulses in pseudo-moral American politics, the embodiment of hypocrisy. Finally, let me die slowly.

Hey, instead of the Terri Schiavo affair, why can't we have - after a decent interval and with due safeguards - a quiet journey to sleep, with drugs to help: death in peace and dignity? It would be the civilized thing.

Posted by Harold Davis at 1:08 PM

March 25, 2005

A Review a Day Keeps the Book Blahs Away!

I've added a neat feed to Feedly.com and the Syndication Viewer. It's from Powell's Book Store, and it features a book review a day from sources such as Salon.com, The New Republic, Esquire, Atlantic Monthly, Christian Science Monitor, and The Times Literary Supplement.

Posted by Harold Davis at 11:08 AM

March 22, 2005

Google Code

Google Code is a new Google site that will post open source projects developed by Googlers as well as material related to the various Google APIs.

Why the open source aspect of this hybrid (APIs and open source) resource? According to the FAQ, "We really care about free and open source software (F/OSS) at Google, and this site is one aspect of that affection."

Seems to me that the F/OSS stuff was probably workable as an external Google project and site when combined with the Google-specific APIs (after all, Google is not an eleemosynary institution).

I'll be taking a close look at the open source projects on Code Google and will report back. In the meantime, you may be interested to know that the Code Google site publishes two syndication feeds: one showing exemplary projects created with Google APIs or tools, the other an update feed to Code Google (both shown here in Syndication Viewer).

Posted by Harold Davis at 8:52 AM

All the Google APIs in One Place

Here's a neat page that has links to all the various Google APIs in one place.

The Google APIs are Web services used to build custom applications using methods provided by Google. By now there are a lot of different Google APIs (some, of course, are more interesting and/or important than others), including:

AdWords (the subject of my next book)
Blogger
Deskbar
Desktop Search
Froogle
Gmail
Google Groups (Did you know that each Group has a related syndication feed?)
Keyhole
Web Search (the subject of Building Research Tools with Google for Dummies)

Posted by Harold Davis at 8:37 AM

World Water Day Google

World Water Day Google:

Posted by Harold Davis at 7:33 AM

March 21, 2005

Google UI Designer Does Her Laundry

She does the wash on company time, gets to show her undies to Colin Powell, and lives to blog about it!

Posted by Harold Davis at 4:09 PM

Fun with Digital Photography

What's a boy need except a tripod, a digital camera, and something to photograph? I'm having fun with taking pictures for my Digital Photography Field Guide (Wiley Publishing). Here's a photo I took yesterday:



Related link: Digital Field Guide companion site (a work in very early progress)

Posted by Harold Davis at 10:07 AM

March 20, 2005

Old Jewish Guys

James Guckert a/k/a Jeff Gannon in a disingenuous but revealing recent interview in the New York Times Magazine attributed the famous Freud quotation "sometimes a cigar is just a cigar" to Einstein and casually dismissed his confusion: "I got my old Jewish men confused."

Leaving aside the dumb and dumber aspect of confusing two of the most important thinkers of the last two centuries, I'm struck by this as an example of the creeping anti-semitism that is little spoken of but pervades portions of the "moral" hard right that controls our country.

Related links:
The Jeff Gannon Affair
Jeff Gannon scandal finally covered in the New York Times

Posted by Harold Davis at 9:17 AM

March 19, 2005

New at Feedly.com

The featured feed of the week at Feedly.com is the best of Craigs List. This is a fun feed. The links are to posts on the list that run a pretty big gamut, but there's plenty of drama and soap opera, and, yes, romance.

A little less obviously, I've changed the architecture under the hood at Feedly.com. The site itself is now generated on the fly from its own RSS feed.

Related links:

Feedly site
RSS Syndication feed for Feedly (XML)

Posted by Harold Davis at 8:20 PM

Google Is a Verb

The Oxford American Dictionary (OAD), one of the "big five" American dictionaries, will officially annoint "google" as a verb when its new edition is released next month.

I google, you google, we google, oh to google in the spring!

Companies typically resist having their name turned into "real" words beginning with a lower-case letter. When Xerox transmogrifies to the verb to xerox, and Google becomes google, there are negative implications for trademark protection of the word. But this is the kind of problem we all should have!

Related links: New York Times article mentioning the inclusion of "to google" in the upcoming OAD, and noting generational changes in the land of lexicons (the editors are young and use Google to help decide if a word is ready)

Building Research Tools with Google ("Google" is used as a lower-case verb in the first chapter)

Posted by Harold Davis at 8:27 AM

March 18, 2005

Is Google Painting Itself into a Corner?

Search has become vastly less important to Google financially than its role as an advertising broker (see my blog entry about this). But search is still crucial to Google's ambitions to become the information portal to the world. I'm afraid that Google's search is facing three very serious problems, and the problems are only getting worse.

Before I get to the problems, two caveats. Google is still my favorite search engine, and I use it all the time, even though there are reams of other options. Google employs legions of very smart people, many of whom probably spend a lot of time thinking about the problems I am bringing up. They may have thought of some answers that I haven't.

The biggest problems with Google's search, as I see it, are:

(1) Spam search results. These range from paid placement advertorials (which may actually have a bit of decent information) intended to direct surfers to a specific merchant to absolutely heinous junk.

(2) Flaws in the PageRank algorithm, which cause the rich (popular sites) to get richer, but make it hard for newer sites (even those with quality content) to get ranked high enough to draw any traffic.

(3) Longer and longer waits before sites and pages are added to the index. This wait time has become as long as two or three months in some cases. The wait for cross-corellation using incoming linkage to assign a rank can be even longer. This creates a static index, inappropriate for a medium as dynamic as the Web.

Of these problems, only the third, long wait times for indexing, seems solvable to me with a scalable technologic solution. (My thinking is that if you throw enough processing power at it, and engage many parallel spider bots, you could probably reduce the wait.)

With the spam and PageRank issues, Google has partially become the victim of its own success. It's so important to get good placement that it is worth thinking up any number of clever tricks to get there, or even to invent spurious content just to improve search placement. Google and the SEO webmasters are engaged in a furious arms race surrounding these techniques, and Google is losing, resulting in the arteriosclerotic condition of your search results.

I don't think that there is any good solution short of hiring human editors to evaluate content. When Google starts hiring people to categorize and evaluate Web pages you'll know they agree with me, and have thrown in the towel on finding a scalable high-tech solution.

Related link: Building Reseach Tools with Google companion web site

Posted by Harold Davis at 9:00 AM

March 17, 2005

The Times Are Changing in Computer Book Publishing

Everyone in the computer book industry knows that the times they are a changing, and not for the best. Sales are down, advances are down, and it's increasingly hard for authors to make a living. With a few exceptions, publishers are contracting: cutting back on their advances, taking longer and longer to pay, cutting their lists, and (in some cases) going out of business. In investing terms, these are secular, not cyclical, changes. They are probably here to stay.

In a previous entry, I wrote about how Microsoft lost the legions of Mom and Pop developers by killing VB6 without offering a viable (non-enterprise) replacement. PHP is the closest language to filling this bill, because the price is right (open source), it runs on Linux/Apache, and targets the Web.

This kind of thing is taking place across technologies. As Matt Wagner put it eloquently recently: "There's a very natural sort of ecology here where the increasingly complex challenge of trying to control a platform is balanced against the almost organic evolution of software made possible by open source technologies and the legions of programmers who contribute to them." So the net impact for computer book publishers, authors, and agents is less readers at the general level (although specialized, high-priced low-print-run books aimed at the enterprise may be a viable niche).

The next factor hitting this business is, of course, the Web, and the ease of searching it with tools like Google. Most reference information can easily be found on the Internet with no cost, so why should someone buy a book to find it? Back in the mid 1990s, I figured that if I got one useful fact, or one programming technique out of a book, it was worth the purchase price. That kind of logic just does not fly today.

Moving onward, a great big problem are the (to a great extent) dumb and dumber me-too publishers in this industry. (Dear Publishers: If you are reading this, and have published a book of mine, or are thinking about publishing a book of mine, or might sometime publish a book of mine, or you know me personally, I don't mean you. :-))

These publishers have got to get it through their heads (or they will perish like the dinosaurs, to use my seven-year-old's favorite metaphor) that:

- A CD-ROM packaged with a book is so yesterday! Information should be delivered via the Web. Adding a CD simply to boost the price-point is a trick that consumers see through in an instant. They will vote with their dollars and stay away.

- People don't need books anymore about how to use applications like Internet Explorer or Microsoft Word.

- The wide audience for information about a proprietary, closed programming system (like Microsoft's Visual Studio .Net) is gone forever.

- More and more, people will look to the Web as the best place to get technically-oriented information. Publishers (and authors) need to formulate a strategy for success in an environment where you cannot sell content, and typically expect to monetize traffic via advertising. This trend is not going away, in fact it is still in its infancy.

- Publishers need to get it through their heads that their series really do not constitute a long-term viable consumer brand. I know this is a controversial statement, which most publishers will take issue with. But publishers have mistaken the marketing clout that a bit of a budget and a relationship with Barnes and Nobles and Borders have given them for true branding. With a few exceptions, nobody I know cares about the series a book is in. They care if the book has quality, integrity, is written to hold a conversation with them, and if the author has a distinct viewpoint. (Remember: For straight reference information, people just go to the Web.) When publishers wake up and smell the red ink and get out of the series marketing miasma, they will realize that the only branding that makes any sense is to brand the author -- and they will start pursuing this strategy like in other parts of publishing.

Related links: See the rather loud discussion about whether computer book agents earn their commission on Joseph Wikert's blog (he is a publisher at Wiley). If you are looking for an agent, I recommend Matt Wagner.

Posted by Harold Davis at 9:02 AM

Leprechaun Google

Posted by Harold Davis at 8:45 AM

March 16, 2005

The Firebox, no I mean the Firefox!

Try typing "Firefox" in the text of the Mozilla Thunderbird email client. When spell check runs, it will think that "Firefox" is an error, and suggest "Firebox" as the replacement.

Hmm...Maybe they should have named the browser Firebox in the first place. Do you prefer a box on fire or a fox on fire, or (if you have kids) a fox with socks?

Posted by Harold Davis at 5:33 PM

The Decline and Fall of VB6

Sometimes there's an event that has little significance on its own, but tells the story of great shifts in the underlying technology protoplasmic ether. Such an event is Microsoft's recent announcement that they would no longer officially support VB6. Although greeted with howls of protest by Microsoft's own MVP team, the announcement simply formalizes what has already happened.

By the way, what the official announcement of non-support means is that no further service packs will be issued for VB6, and that all unpaid technical support is ended as of the end of March. Developers can still get technical assistance from Microsoft if they pay for it through 2008.

Once the programming language with the most programmers in the world, Visual Basic is now a backwater of a language with little to recommend it. In the .Net world, C# is a much more elegant language than VB, and even has a bit more functionality. So there's really no reason anyone sensible would use VB.Net, the .Net version of Visual Basic, even if they were building .Net applications.

De facto, Microsoft killed Visual Basic in 2001 when it introduced .Net without a good way to upgrade old-style VB6 code. Now, admittedly .Net, which provides an abstraction layer with a great deal of functionality between operating environments such as Windows or the Web and a fully object-oriented programming syntax such as VB.Net or C#, is cool and a great development environment. Far better, in fact than the old VB6. But there's a problem that the structure of VB6 and VB .Net are so different so there's no reasonable way to move code from one to the other. For any sizable project, you'd truly be better off re-writing in .Net to do it right and follow OOP best practices rather than some kind of mechanical port. It's also the case that some VB6 code actually compiles under .Net, but produces results that are not what the VB6 developer intended.

These issues are significant. But even more important is just who .Net and VB.Net are intended for. These are enterprise products, with an enterprise price tag, and an enterprise overhead in terms of the knowledge necessary to use the product well, computing power required, the operating system needed, and so on. But it is the mom and pop developer that made VB6 so incredibly popular, and VB.Net has left this core constituency in the dust.

To a very great extent, instead of trying to deal with the move from VB6 to VB.Net (or C#.Net), the mom and pop developer decided to put their applications on the Web, using languages such as Javascript, Perl, and (most widely and appropriately) PHP. It's unwise to underestimate the intelligence of any computer programmer, even the mom and pop developer, and given the choice of the horrendous and dubiously appropriate upgrade, these people probably made a very smart move. The Web is the closest thing we have to a universal platform.

All these mom and pop developers have left the Microsoft stable forever, and are not coming back. In a classic case of shutting the barn door after the equine inhabitants have fled, Microsoft is attempting to address this issue with the upcoming release of Visual Web Developer Express (part of the Visual Studio 2005 release). It won't work.

Related link: C# Programming Tips and Techniques on Braintique.com

Posted by Harold Davis at 9:43 AM

March 15, 2005

GoogleX

GoogleX is a cool, new user Google interface made using Javascript and DHTML, new from Google labs. Run your mouse over those groovy icons above the search box! Here's more about it from its creator.

Posted by Harold Davis at 9:26 PM

March 14, 2005

Competition Comes to Google's AdSense

I am delighted to report that Yahoo will be opening a program providing contextual ads for small Web sites, like Google's AdSense does now.

Yahoo, like both Google and MSN (part of Microsoft) has long sold contextual advertising on large sites. But Google's AdSense program (which together with its AdWords program) functions as a virtual monopoly for the small or niche publisher who wants to earn revenue by displaying contextual advertising. This kind of advertising is "pay-per-click," meaning the publisher gets paid when a bona fide user clicks on the ad link, regardless of whether the user eventually buys anything or not. In this sense, it differs from affiliate ad programs, which make publishers partners because they only pay when the user actually signs up for the advertised product (but then the payout is greater).

As a small-time Web publisher, I can assure you that monopolies suck (even when the monopoly is run by a company I have as much affection for as I do for Google). If you've signed up for Google, Google is the prosecutor, judge, and jury. There is no appeal from a Google decision, which sometimes isn't even proceeded by a warning or notification. As a participant in AdSense, you are forbidden to disclose anything about it, including your revenue and how much an individual ad, or word, is worth. Google also does not disclose to you what percentage they take (most likely, an usurious 60-70%). As a member of the AdSense program, you are not allowed to suggest to readers that they can support you by clicking on ads. (This last is not an entirely unreasonable restriction.)

Google also censors Web page content, excluding "sin" sites, and (more dangerously) has been known to pull controversial sites from the program. Since Google also controls the traffic flow (read: search engine) I also worry that they may game search engine results in favor of their advertising programs.

All this is a huge litany of complaints. In defence of AdSense, I must also say that it has encouraged the spread of cool content on the Web by enabling small Web publishers to monetize their content in a way that was not possible before Google introduced the program. But power corrupts, and absolute power corrupts absolutely. The opening of the small Web publishing world to real AdSense competition, if it really happens, will be a grand and glorious thing. Thanks Google (for coming up with the idea)! Thanks Yahoo! (for opening the field to competition)!

Here are some of the sites I publish (in addition to this blog):

Braintique
Feedly
High Risk Pregnancy
Hot Feeds
Mechanista
Syndication Viewer

Posted by Harold Davis at 7:54 AM

March 12, 2005

De parvis grandis acervus ent

Roughly speaking, the Latin phrase "De parvis grandis acervus ent" translates to "Great big things come from a small start."

It's the motto of the Google Toolbar (see my recent blog entry about the Google Toolbar, and a follow-up). You can see the motto if you have the Google Toolbar installed in Internet Explorer by clicking the Google button and selecting About from the Help menu.

Res ipsa loquitor ("The thing speaks for itself").

Posted by Harold Davis at 3:21 PM

March 11, 2005

The Eggregious Defensiveness of a Technology Columnist

In my blog entry yesterday, I discussed the controversial new Google Autolinks feature (part of the "Beta" of version 3 of the Google Toolbar for Internet Explorer).

I wrote about this interesting issue in part because it was featured in Walter S. Mossberg's Personal Technology column in yesterday's Wall Street Journal (03/10/2005) (no link provided because the WSJ is a subscription only site). Mossberg got his facts slightly wrong. In his column he wrote that the "browser actually adds links right into the body of any Web page."

It doesn't do this, it only appears to add the links into the page in the version displayed to the user in their browser. This is an important distinction for a couple of reasons: Most importantly, Google couldn't modify an actual page without violating copyright and ownership in that page (so they would never be able to implement Autolinks if they were actually modifying pages). It's also the case that the mechanism of on-the-fly modification can be blocked by savvy Webmasters (see the links in yesterday's entry for details).

Apart from these pragmatic considerations, I'm a believer in being accurate in communications with end-users, so that the curious among them have a reasonable chance at understanding how things actually work.

I emailed Mossberg as follows:

You state in your rather thoughtful column today about the Autolink feature in the Google Tool bar v.3 that the Tool bar "actually adds links right into the body of any Web Page."

Of course, this isn't quite right: Autolink cannot modify a Web page sitting on my server. The Autolink modifies on the fly so that the version of the page that appears in the IE browser apparently has the links added (this may seem to the average user as the same thing, but it is an important distinction). One could look at it this way: as a Web publisher my original pages are intact. It's just that some viewers (with a little help from Google) have decided to modify (enhance?) my pages when they are viewing them from the privacy of their own homes.
...
He responded:

It's actually a distinction without a difference -- technically correct, but practically irrelevant to users.
Walt

I replied:

Well, I don't agree. One difference is that if Google actually really modified a page, it would clearly violate my copyright ownership in the page as a publisher -- however obnoxious the Toolbar is, it doesn't violate copyright because it is not making an actual modification except on the users system.
Harold

Mossberg got defensive:

When did I ever say it violated copyright, or make any other legal claim or argument?
Walt

I had the last word (at least for now):

Jeez, of course you didn't but you just wrote me that it made no difference to users. It does, because with the copyright/ownership issue Google simply couldn't have done it the way you suggest. Curious readers would have to wonder.
Harold

Now, generally I think that Walter Mossberg does a good job of educating the public about technology issues. But this stuff with Google is incredibly important to the future of the Internet, and I'd like to see better understanding of the issues involved, not worse.

Posted by Harold Davis at 8:18 AM

March 10, 2005

Does Google Play Fair? Part II: Autolinks

With its recent release of the beta of Version 3 of the popular Toolbar for Internet Explorer, Google has walked into a firestorm of negative publicity surrounding its new Autolink feature.

When you download the new version of the Google Toolbar, and enable the Autolink feature, Google will suggest links for addresses (a map), ISBNs (Amazon), VINs (information about the car at CarFax), and tracking numbers (shipping info) on a drop-down list on the Toolbar.

The troubling and controversial part of the program is that when you click the Toolbar button, Google will also "add hyperlinks to the Web page in the browser." I've quoted the phrase "add hyperlinks to the Web page in the browser" because, of course, Google cannot actually add code to a Web page residing on a remote server. (You can verify this by viewing the source of a web page to which Autolinks has done its thing. It will appear unchanged, without the Google-added hyperlinks.) What the Toolbar in fact does is process the original Web page so that Internet Explorer displays a version to you with added links, but it doesn't change any original code. Still, to a casual person surfing the Web, this distinction won't make any difference: the Web page will appear to have had links added.

Here's a page from eVirtus!Net with some good technical information about what is going on, and a JavaScript to block it on your Web pages, and another script from SeachGuild that blocks Google Autolinking.

Google notes that the Autolinking feature is very opt-in (you have to first download the Toolbar, you have to run with Autolinking turned on, and you have to click the Toolbar button to get the links "apparently" inserted). Also, Autolinking is not performed on information that was already linked (for example, if you have an ISBN on your site that is linked to Barnes and Nobles, Google won't change the link to Amazon).

As a researcher who views many Web pages, I find Autolinking occasionally a little convenient, but my preference is to view pages as they were written. As a Web publisher Autolinking concerns me: I want my pages to be viewed the way I wrote them. If I link an ISBN to Amazon, I will use my affiliate id (not a generic Google autolink), and if I don't link it, I probably have a reason.

Like any juggernaut, Google doesn't have to ask permission. The impact of the current Autolink hijacking is small: users have to decide to use the feature via multiple opt-ins, and the information hijacked is relatively minor. You even get to supply an alternate map supplier (MapQuest or Yahoo! Maps) if the Google Maps leave you as underwhelmed as they do me. But as a publisher, I'm concerned that Google has started down a road that will lead to modifying my apparent content and/or redirecting my advertising links. This worries me.

Posted by Harold Davis at 2:41 PM

formerly Rosie

I've added the free association ramblings of Rosie O'Donnell to the Syndication Viewer.

Posted by Harold Davis at 9:45 AM

March 9, 2005

Does Google play fair?

Bouncing around the blogosphere are a raft of accusations that Google is not even-handed when it comes to its policy against "shadowing." (See this piece from ThreadWatch and a piece and follow up from SearchEngineWatch).

There is some controversy about how "shadowing" should be defined, and some fairly arcane technical details involved, but the basic idea behind shadowing is simple: a web site looks one way to viewers, and presents an alternative face with the intention of getting a better search ranking to the search engine bot. Google officially frowns upon shadowing. It is a violation of the Google TOS, and theoretically in eggregious cases it can lead to exclusion from the index.

The accusations against Google are that they themselves engage in shadowing on behalf of their AdWords context-sensitive ad brokerage, and also practice favoritism towards some "good" practitioners of shadowing. The evidence is a bit murky.

On the whole, I'd be inclined to be surprised if Google really did "play fair," whatever that means. It's not in the nature of successful, dominant software companies (think Microsoft and Oracle) to play Cricket following rules according to Hoyle (to mix my metaphors).

That Google may feel the need to stack the deck towards AdWords is interesting. From the viewpoint of a Webmaster and site owner, I've been looking at the alternatives to AdWords/AdSense and will comment on this more in another entry. I also think that shadowing is a symptom of a really big problem that is out of control (and that Google may not be able to fix), search engine gaming, also called search engine spam. Stay tuned for more on spam of the search engine kind, and what can (and cannot) be done about it.

Posted by Harold Davis at 8:36 PM

March 7, 2005

Is Google hypocritical?

A recent column on TheStreet.com accuses Google of being hypocrtical. The general gravamen of the accusation is that Google doesn't disclose enough information, which, according to the column, conflicts with the famous dictum in the Google IPO filing to "do no evil": "Let's name Google's holy trinity for what it is: a bunch of hypocrites. Genius hypocrites. ...Revolutionary hypocrites. But hypocrites all the same."

The column is a little skimpy on specifics, but one is that Google Prez Eric Schmidt couldn't (or wouldn't) answer the question of how much of its ad revenue was coming from old-time Internet advertisers versus "offline advertisers," presumably new to pay-for-click. In addition, according to the column, Google is stingy about disclosing financials, didn't mention a top-level consulting strategist in its SEC filings, and expects investors to buy Google on faith.

Personally, I find Google's unwillingness to put speciously precise numerical values on speculation about the future refreshing. Investing in Google does require a leap of faith, so why not present it that way?

But what lies behind this rather loud, and somewhat ad-hominem, attack? I think it is a continuation of the investment community's upset with Google over the company (successfully) going around the investment banks in the IPO process (they did it their way!), and irritation at Google's continued expression of utopian idealism and big ambitions. Wall Street wants everyone to believe that profits, delivered in the way Wall Street wants them, are all that matters. Google is living proof that sometimes this just ain't so.

Posted by Harold Davis at 5:02 PM

March 6, 2005

The Super-Size Mega-Store Robinhood

Daniel Akst, a usually intelligent columnist, suggests in an opinion piece in the business section of today's New York Times that Wal-Mart investors are today's Robin Hoods. The argument is that these investors (of whom the Walton family is the largest with almost 40% of Wal-Mart and Barclays Global with 14% is second) have accepted subpar performance, effectively subsidizing Wal-Mart's low prices and benefiting the poor.

Personally, I'd place Wal-Mart more in the category of the evil Prince John or the Sheriff of Nottingham than Robin of Locksley. But, according to Akst, the shabbily treated Wal-Mart workers don't matter: they couldn't find work elsewhere, and would be "in real trouble" if Wal-Mart didn't employ them. (Akst doesn't mention Wal-Mart's role in driving local business employers under, or the human and environmental consequences of its role in China.)

I say, as long as we're going down this road assuming that investors are implicitly subsidizing when they lose money, why only think of a piker like Wal-Mart? Let's go for the real mega superstores of investor subsidizers, all those poor schmoes who lost their shirts in Enron, Global Crossing, and Worldcom to enrich sumbags like Bernie Ebbers, Kenneth Lay, and Jeff Skilling!

Posted by Harold Davis at 5:29 PM

March 5, 2005

Roundup at the RSS Corral

The electronic investor in today's Barron's has a roundup of RSS syndication feed applications that I think mostly gets it right.

The piece mentions the following syndication readers: RSS-Reader as a lightweight standalone RSS reader (it lets you paste feeds in or choose from a list, and is free), FeedDemon ($29.95), Newsgator Outlook Edition ($29, integrates with Outlook), and Pluck (free, either standalone or an Internet Explorer plug-in). Also noted, the Firefox browser and Thunderbird email client feature built-in (albeit a bit clunky) RSS reading.

These are all reasonable software choices for reading syndication feeds (whether RSS or Atom). The piece also mentions the issue of finding feeds, suggests several directories of feeds, and notes My Yahoo provides access to a claimed 150,000 feeds as well as a Web viewer for the feeds.

The conclusion: "RSS is still a bit too, well, pushy for our tastes. Do you really need another set of interruptions added to your day...Many investment and news sites already offer more targeted e-mail alerts for must-have news."

I agree that we don't need any more interruptions in our lives, but the Electronic Trader misses the utility of syndication for delivering certain kinds of information. My Syndication Viewer and Feedly are designed to get around the "too pushy" problem of syndication viewers. No special software is required, just a normal Web browser. You visit the sites when you want. You don't have to search through massive feed directories (or search the Web for feeds), because the best feeds (and only the best feeds) are presented. (Feedly presents six top feeds on a rotating basis, Syndication Viewer provides a moderately extensive categorized list of feeds.)

You can bookmark the feed or feeds you are personally interested in for future reference. For example, the SEC EDGAR feed of insider trading provides a great mechansim for getting insider trading information on a timely basis with both a granularity and an ease of scanning that is hard to duplicate elsewhere.

Posted by Harold Davis at 10:00 AM

March 4, 2005

Wal-Mart and Google slug it out!

I ran a GoogleFight between Google and Wal-Mart, and Google knocked Wal-Mart out by 158 million to a tick under 6 million. Hardly a contest!

What brought this to mind was a Digital Rules column by Rich Karlgaard recently published in Forbes Magazine. Karlgaard, a Forbes publisher (or, for all I know *the* Forbes publisher), compares Google and Wal-Mart and says they are more alike than people think: Both have a simple mission and a simple brand, both company's products are simple to use, both companies are technology leaders, and both companies exploit the "cheap revolution" -- Wal-Mart by importing from China and (although it is not stated) underpaying employees, and Google by running cheap Linux servers.

Well, no. From a philosophic viewpoint, all "things" have some commonality simply by virtue of being things, and certainly all corporations have some commonality by being corporations. You could take this a bit further and say that all very successful corporations (and both Google and Wal-Mart certainly are that) have a fair amount in common (like money in the bank). This is a bit along the lines of Tolstoy's famous dictum that all happy families are alike. But, in fact, it is hard to imagine two successful corporations with a greater difference than Wal-Mart and Google:

Google hires the smartest people it can find and gives them time for their own projects (except in its technology department in Bentonville, Arkansas Wal-Mart doesn't care much about employee intelligence, and rigidly structures store employee time with things like penalties for going to the bathroom too often)

Google provides employees with great benefits (Wal-Mart weasels out of paying any benefits at all much of the time; avoidance strategies include using captive illegal labor)

Google's mission is to be an information portal, and makes its money via digital mechansims related to information (Wal-Mart trades in physical things)

Google makes more than $500,000 of cash flow per employee (Wal-Mart makes $16,000 per employee, or about 3% what Google does per worker)

Google makes a conscious effort to care about what it does and the world around it and to "do no evil" (Wal-Mart could care less about what it does to local merchants, employees, or the environment in China as long as it can squeeze a few pennies more profit)

Why the unlikely Forbes equation of these two? I think it is part of a subtle business-world putdown of the idealistic component of Google and the Internet. All that matters, so this goes, is profit and the bottom line which is in keeping with the insidious logic of our times. Google is really Wal-Mart in this world of Orwellian speak, and yes, we can save social security by looting it and handing the proceeds to the investment community -- because, after all, it is all about profit, and no company is really different from any other.

Posted by Harold Davis at 12:28 PM

March 3, 2005

Featured Feeds

Our new site, www.feedly.com, features six editor choice best of the best syndication feeds that are fun, interesting, and useful. The feeds will be rotated regularly as I find new and exciting feeds.

Posted by Harold Davis at 5:32 PM

Foiling the Click Fraudsters

I've been promising to address the issue of click fraud, and I'm spurred to do so by an article in today's New York Times on the topic. I don't suppose this entry will be the final word on the topic (meaning that you can expect more from me on it) but I'd like to start with the following issues: What is click fraud? Who is harmed by click fraud? What are the motivations behind click fraud? How easy is it to commit click fraud? What is the current state of the art in click fraud prevention and detection? How big a problem is click fraud? Note: I am limiting this discussion to Google, but it applies equally well to all purveyors of contextual pay-for-click ads.

1. What is click fraud?
Click fraud is intentionally following a pay-for-click link in order to gain financial benefit while having no real interest in the good or service advertised in the pay-for-click link. Specifically, Google AdSense policies forbid site owners from clicking AdSense ads on their own sites.

2. Who is harmed by click fraud?
The biggest losers are AdWords advertisers. If advertisers pay for fraudulent clicks, then they are overpaying for contextual ads. To the extent that click fraud is widespread, or perceived of as widespread, then it creates a problem for the entire pay-for-click contextual ad industry and content providers on the Internet that depend on this industry for their revenue stream.

3. What motivates click fraud?
I can think of three click-fraud motivations. I've already mentioned that it is a potential way for site owners to increase their revenues. Second, a malicious competitor might attempt to "stick it" to a competitor by engaging in fraudulently clicking the competitor's ads. The final motivation is the same as the motivation of the poor lost souls who write viruses: because it is there and a challenge and naughty.

4. How easy is it to commit click fraud?
Very easy on a small scale and almost impossible to do without detection on a large scale. If a site owner clicks a few times on the ads on their own pages from an IP that is different from the one used to register with AdSense, and deletes cookies from the browser following each pay-for-click click, it is impossible to detect. If the site owner's friends around the world each click an ad on the site once a day, no one will ever be the wiser. (Who knows? These people might end up buying something from the links they click through while they are at it.) There are also (unconfirmed) reports of distributed networks of users in India and China committing this kind click fraud campaign on behalf of clients.

However, any massive click fraud campaign -- say 100 clicks or more on the same ad -- whether automated or manual is certainly statistically detectable.

5. How is click fraud prevented and/or detected?
Click fraud prevention systems typically place their own cookies on a users system when the user clicks through, and then tracks the user, issuing discouraging messages to repeat clickers, and possibly denying access to the destination site. This approach has some drawbacks, however, as it may discourage legitimate prospects, and may be defeated by deleting cookies.

Detection of fraud is a statistical matter, and Google (and the other pay-for-click vendors) are close-mouthed about how they analyse the data (for obvious reasons). This link from SEO Website Marketing should give you an idea of the raw ingredients that go into a statistical hunt for click fraud.

While Google and the other major companies do put great effort into foiling click fraudsters, if you spend any kind of money using AdWords, you need to monitor for click fraud yourself. This link, also from SEO Web Marketing, gives you some idea of the kinds of things you should look for in your Web Server logs and analysis software. An industry has sprung up around helping AdWords customers detect click fraud; you can find many of the players by searching for "click fraud" in Google. One issue: a pay-for-click vendor like Google is the court of last resort if click fraud is alleged, and some advertisers have been less than overwhelmed by their responsiveness to allegations (e.g., they don't happily give refunds). Ultimately, to address this problem, the industry may need a click fraud referree.

6. How big a problem is click fraud?
A good question. One recent study by the Search Engine Marketing Professional Organization (SEMPO), a trade association showed only 6% of all segments of advertisers thought it was "a significant problem we are tracking." (In contrast, 31% either had not heard of it or were not worried.) As reported in the study, marketers felt that search spam is a much bigger problem, and I think this is right. I continue to believe that click fraud is an issue like retail "spoilage": it is a cost of doing business on a small scale (and some small scale perps are converted to customers), and detectable on a big scale.

Posted by Harold Davis at 12:44 PM

March 2, 2005

Be the first on the block to know!

I've added a Business and Investing category to the Syndication Viewer including feeds from the New York Times, Wall Street Journal, MarketWatch, and SEC EDGAR filings. I have also added an analysis and update feed from The Economist (in the News category) and Alexa hot search (a feed that tracks hot search terms) in the Internet category.

Here are the details and links for the Business and Investing Feeds:
Aggregation of business and investing feeds
Selection of business and investing feeds
New York Times Business (Breaking business news)
Wall Street Journal (market news)
Wall Street Journal (US business news)
Dow Jones MarketWatch (Stocks to watch)
Smart Money (Stock screener)
Market Wire (Investment opinions)
Forbes.com (Investment news and strategies from Forbes)
EDGAR (SEC EDGAR insider trading filings)
Editorial note: I'm really looking forward to using the EDGAR feed. Syndication seems a really useful and timely way to have this information available. This kind of information can be used to make money trading if you get it early, and syndicating directly from the SEC seems like a great way to be the "first on the block" to know.

Posted by Harold Davis at 12:12 PM

March 1, 2005

Validate those feeds!

It's telling to me that a high proportion (meaning 10-20%) of the feeds I want to add to the Syndication Viewer don't validate. This means that something is technically wrong with these feeds, usually something easy to correct. The consequences in the Syndication Viewer range from fairly benign (for example, the dates associated with items are all wrong) to fatal: the feed won't display.

The moral I draw from this is that RSS and atom feeds are still way under-subscribed. Otherwise, someone else would have reported the problem to the creator of the feed.

Feed syndicators, please validate your feeds and fix any problems that are pointed out in the validation process. My Submit your feed page provides a good feed validation mechanism. It is in your interest to make sure your feed is working.

I've added three new feeds:
eBay General Announcements
Mortgage News Daily
Quicken Loans Mortgage News

Regarding the eBay feed, I want to figure out a good way to generate a syndication feed from ad-hoc eBay queries. If you are puzzled about why I've included two mortgage-related feeds, see my Words for Sale blog entry.

Posted by Harold Davis at 6:26 PM

Words for Sale

Have you ever wondered which are the most expensive words? Expensive words are ones that AdWwords advertisers pay more for. The more the advertisers pay, the more sites that host AdSense contextual ads receive (and the more that Google as the broker in the middle of the auction makes). So the questions of expensive versus cheap words has very practical consequences not only to advertisers but also to Webmasters who run sites that get AdSense revenue.

Here are some of the most expensive terms (according to a recent report in E-Commerce News):

"Vioxx", worth as much as $16.50 per click to class-action lawyers
"Mesothelioma", worth up to $39.08 per click, also to class-action lawyers
"Car insurance," worth $8.08 everytime someone clicks a Progressive Insurance ad

According to one study cited in the E-Commerce News article, the non-brand group of terms that averaged the most money by industry is "mortgage related" at $4.79. The industry group whose terms brought the least is "consumer retail" at $1.70.

From the rational advertiser's viewpoint, the issue has to be how many of the click-throughs can be converted to sales online or through combined channels (someone reaches a destination page and picks up the phone to dial toll free and complete a mortgage application). These advertisers must simply figure to write-off click fraud issues as a cost of doing business like inventory "spoilage." (Regarding click fraud, see my post about Click Detective. I plan to write more about this.)

All this about buying and selling words reminds me of the "Confusion in the Market Place" chapter in Norman Juster's wonderful The Phantom Tollbooth (the book was illustrated by Jules Feiffer) which describes the Word Market in Dictionopolis: "Get your fresh-picked ifs, ands, and buts." "Juicy, tempting words for sale."

Viewing this market, the bored young protoganist Milo "had never thought much about words before, but these looked so good that he longed to have some.

'Look, Tock,' he cried, 'aren't they wonderful?'

'They're fine, if you have something to say,' replied Tock..."

Contextual ads tied to words that use an auction mechanism for purchasing are here to stay, and obviously can be very effective. But let's not get carried away. You need something to say, and you need the right tools for measuring effectiveness and purchasing the right words.

Posted by Harold Davis at 12:09 PM


Google
 
Web www.braintique.com
www.digitalfieldguide.com www.googleplexblog.com


Home | Barticles | Blogs | Books | Services | FAQ | Contact

© Braintique.com. All rights reserved.

Search Engine Optimization





RSS 2.0 Syndication feed

Syndication Viewer

Our Web host:
IX WebHosting





Food for Your Brain! Get a Barticle! Questions Answered Books for You What We Can Do For You Contact Us Brain Food Questions Answered Books for You What We Can Do For You Frequently Asked Questions About Us Google Research Photoshop Wi-Fi and Wireless Networking The Natural Way to Write