Braintique.com header
Left Navigation Bar

The Googleplex Blog


Web Pontification Archive

An iconoclastic look at Google, research, the Web, the state of the world, and anything at all that interests Harold Davis.

July 28, 2006

Spam Bites Man

I am the first to admit that I do not really understand email spam.

What kind of blithering idiot really thinks that an illicit fortune in Nigeria will actually be transferred to their bank account, or that some compound will make their sex organs or mammary glands larger, or that something offered in an unsolicited email might actually be a good idea?

Yet these blithering idiots must exist, otherwise spammers wouldn’t bother.

On the other side of the fence, what kind of jerk spends their time, creativity, and energy coming up with novel ways to bombard people with junk solicitations for stuff they don’t want (or are outright scams)?

Yet some jerks spend their time this way, or our inboxes wouldn’t be full of the stuff.

And my final rhetorical query: why all the indignation about spam? It is arguably easier to deal with email spam than junk snail mail (you just hit the delete key instead of having to cart it to the recycle bin) and less intrusive than telemarketing (the call inevitably comes just as dinner is on the table, or you are putting the kids to bed, and then you feel bad for being rude to a real human being).

Since email spam has recently bitten me, I now at least understand the pain that spam can cause (almost as bad as psoriasis) . Here’s my sad but true story of spam bites man.

First, we’ve set our business and personal email up for a long time to be forwarded by our web host (IX Webhosting) to our local ISP (Comcast). We then use Comcast’s POP and SMTP servers to pick up and send email.

Comcast’s spam filters have historically done a pretty good job of shielding us from the worst of spam. It’s true than my email address is published all over the place on websites, and that I’m listed as registry contact for various sites, so my email address was spammed frequently.

But Comcast shielded us from the worst of it, and my email client, Thunderbird, had junk mail screening tools that did the rest. Whatever got through I hit delete, and didn’t worry about it too much. (Although I did wonder: if I can tell from a glance at the sender and subject that an email is spam, why can’t an automated system?)

Those were the days!

The first sign of trouble was the report from a couple of friends and business associates that their email to me was bouncing with some kind of message from Comcast about spam. I pretty much assumed that the problem was with the senders, and that somehow they (or their ISPs) were associated with notorious spammers. Oh foolish pride! I should have realized that the problem was me (well, me and the whole spam email situation).

It was a dark and stormy night when our email stopped altogether. I don’t mean that the quantity slowed down or that we lost connectivity. I mean that we stopped getting our email. Our inboxes were empty.

Comcast technical support said sure your email works, and sent me an email to prove it. IX Webhosting said sure your email forwarding works, and had me forward an email to a new address to prove it.

I may not have been the brightest bulb on this issue, but it didn’t take me too long to figure out that Comcast had blocked all email from our domain as spam.

It turned out that they weren’t going to unblock it, either.

What to do? It seemed like the simplest solution was to use the email services for our domain provided by IX Webhosting, rather than forwarding the email to Comcast. I changed our server-side settings, and our email clients, and the email starting flowing again. And flowing. And flowing. Simply incredible quantities of spam, hundreds of spam emails in a single hour.

Perhaps Comcast’s blocking my email made some sense, after all.

What to do? It was time consuming and stressful to go through all the spam, and something had to give. Taking a new email address really didn’t seem desirable.

I did a Google search and decided to try a third-party email sanitizing service offered by an outfit called SpamStopHere.com. StamStopHere auto-generated a new set of MX records, which I was supposed to get IX Webhosting to implement. (I think this meant that I was asking IX to give over our email to this other outfit.)

IX Webhosting refused to comply, and SpamStopHere said, “We are adding IX Webhosting to the list of web hosts we cannot work with.”

Great.

IX Webhosting said that they had their own server-side filtering, and showed me how to write a custom filter in Thunderbird that recognized the spam header their filtering system added. (This, however, didn’t do Phyllis’s Outlook Express client any good, as it doesn’t have a mechanism for recognizing custom headers.)

IX Webhosting also had me implement a client-side open source program, SpamPal.

These tweaks were no doubt well intentioned, but they didn’t really touch my spam problem. One morning I spent an hour cleaning up my spam. (Well, it was complicated by a virus payload and Norton Antivirus. The AV software made me click OK to deleting each and every infected file, and OK again after the file had been deleted).

After a bit more research, I signed up for another service, Spam Arrest. You give Spam Arrest access to your email in-box, and it automatically downloads your email. Your email client connects to the Spam Arrest servers via a secure connection.

Spam Arrest has quite a few features, but mostly it works by maintaining blocked and unblocked (authorized) sender lists. A good starting place for the authorized sender list is your address book, which you can upload from any email client.

If you haven’t authorized the sender, Spam Arrest sends them a challenge-response email. The sender must click a link to authenticate the email, something most spammers can’t do.

The service costs $6.00 per month, with a free trial and various discounts. So far, I am really pleased with it. It’s great to open my email in-box and to see only a handful of emails, all of which are real. I also enjoy opening my webmail page on the Spam Arrest site and seeing all those hundreds of spam emails sitting forever lost in cyberspace (they are erased once a week by default). The goal of an intense week of my life was to elude spam, and I think I’ve succeeded!

Posted by Harold Davis at 4:08 PM

April 21, 2006

May the Great eBay, Google, Microsoft, Yahoo! Games Begin!

In a front-page article labeled Behemoths' Dance, the Wall Street Journal reports today that eBay is trying to find ways to lessen its dependence on Google by forming closer alliances with Microsoft and/or Yahoo!

Of course, I've seen the AdSense ads for eBay when I enter a search term such as antiques or Nikon D200. In fact, almost any search term query at all that might possibly be something one might buy brings up (among other AdSense search ads) an ad linking to eBay mentioning the item.

The eBay ads via Google cover so many terms that they sometimes overreach. I'll bet you didn't know you could buy thoughts on eBay. A Google search for thoughts returns this eBay ad:

Thoughts
Whatever you're looking for
you can get it on eBay.
www.eBay.com

Well, I gosh darn sure hope I can't buy what I'm thinking of right now on eBay (and, no, I won't tell youit's private!).

In another example of overreaching in ad placement, a search for the term cliff returns (along with search results) this ad:

Cliff
Looking for Cliff?
Find exactly what you want today.
www.eBay.com

Now, I know you can find most things on eBay, but I haven't seen that many cliff auctions lately. Maybe I'm missing something. What I also seem to have been missing, at least according to the WSJ article, is the extent of money eBay pays Googleaccording to the Journal, eBay won't give out numbers, but a big chunk of its $400 Million annual online ad budget goes to Googleand the extent to which Google's traffic is vital to eBay. (On a personal note, when I want an eBay auction, I go straight to eBay. I can't remember having clicked through to eBay from a Google AdSense search ad.)

When eBay started placing these huge ad buys with Google starting in 2001, eBay did not regard Google as a competitive threat, just the vendor with the best search engine technology. In fact, eBay probably felt that as the customer spending the money they had the upper hand in the eBay-Google relationship. (A side note here: isn't it amazing how Google has been able to take advantage of the behemoth internet players to gain its current position, first honing its search engine technology at Yahoo's expense and then gaining literally billions of dollars of easy revenue from eBay since 2001?)

As Google's tentacles began to stretch wider, eBay came to consider whether Google was, in fact, a threatand to wonder whether eBay's now deep dependence on traffic from Google constituted a giant vulnerability.

Certainly, Google wants to be the leading online entry point for online commerce, although it has a ways to go to achieve this ambition. Google Base, a free online classified service, is at least indirect competition to eBayand direct competition to Craigslist, part-owned by eBay. Google is also developing an online payment service to compete with eBay's PayPal, although the Google service has yet to manifest itself in a serious way, and PayPal's ubiquity will be hard to rival.

eBay's response to all this is marked by internal confusion. At a summit meeting of eBay executives that met in the summer of 2005, role-playing was used to assess the threat. A "green team" thought from eBay's perspective and concluded that there was no threat and that business-as-usual should continue. In contrast, a "red team" thought from Google's perspective, and concluded that Google was planning a move into eBay's primary turf.

Clearly, eBay insiders are divided. As the Journal puts it, "Indecision within eBay will probably delay any conclusion." In the meantime, eBay is holding discussions with Google rivals Microsoft and Yahoo. Also, eBay's new WantItNow site is a shot across the bow of Google as online ecommerce entry point.

Often, I just want to buy something and not deal with the hassle of an online auction. In the past, I could have browsed through eBay looking for a Buy It Now button on the thing I wanted. But this was cumbersome. So usually I just put the item into Google or Yahoo (but mostly Google), and found someone to sell it to me. WantItNow features only items that are immediately available. It is an attempt to counteract this dangerousto eBaytrend of using Google to find items available for immediate purchase.

It's not clear what will happen in the great eBay-Google-Microsoft-Yahoo games. These games will be great fun to watch, and are significant for the future of the web. The outcome will no doubt be studied in business schools of the future. For the meantime, the only thing that's really clear is that eBay does risk becoming marginalized by Google. In this scenario, eBay becomes simply another backend product supplier, and Google controls the gateway. Most likely, eBay continues to be viable (after all, why should Google choke such a good source of revenue?), but the brilliant future belongs to Google.

Posted by Harold Davis at 10:13 AM

March 21, 2006

Private Wikis As Knowledge Management Systems

Recently in The Commune and the Scholar I wrote about the conflict between communal information repositories (such as the Wikipedia) and the distinctive voice of lone authority.

Several readers have pointed out that Wikis are just as useful-if not more so-in private contexts as they are as general sources like the excellent Wikipedia.

Many companies and institutions-from entire enterprises to small workgroups-have replaced complex Knowledge Management Systems with wikis. Wikis can also be used to help share knowledge across organizations. For example, a publisher I work with has organized a wiki to benefit all the contributors to a specific series of books that share the same resources, vocabulary, and ideas.

What are the advantages, and what are the dangers, with private wikis?

Very much on the plus side: Private wikis can cut through bureaucracy, and make it easier for people to share information.

On the downside, with a wiki you may not know who has contributed what, and with what degree of authority?leading to possible confusion and delay. This is essentially the same problem as with public wikis, and can be mitigated in the same fashion that Wikipedia has used: openness about issues and process, and clarity about roles, reviews, and responsibility. Still, a reasonably sane corporate denizen would be wise not to accept private wiki information as gospel without understanding its source-and where the source fits into the institutional zeitgeist.

A related private wiki issue is ease of use. If knowledge workers find it difficult or time consuming to use a wiki, they won't-and it will lose utility as a knowledge management system. This implies that institutions may find it worthwhile to go with licensed wiki software such as Socialtext, or to use a consultancy specializing in wiki knowledge management systems, rather than going it alone with open source wiki infrastructure.

More important than choice of software or software implementer, any institution establishing a private wiki should establish an initial team tasked with clearing potholes out of the way and training users. Otherwise, the private wiki will likely only see marginal use-and the goal of creating less hierarchical knowledge management will fail.

Posted by Harold Davis at 4:01 PM

March 17, 2006

The Commune and the Scholar

Everybody who uses the web-whether for fun, research, or profit-knows that much of the best content on the web is supplied by the community. This content is created in myriad ways-but is communal, usually not for profit (other than AdSense revenue!), and usually posted with the barest minimum of structure, verification and oversight.

The jeremiads of bloggers rise to the heavens but provide some useful insights.

Profiles on MySpace are the kind of superficial self portraits you'd expect of teens on the make?but can also show wonderful creativity and expressiveness.

Photographs posted to Flickr can be insipid not-quite-in-focus family album affairs-but also can rival or surpass the work of the best professional digital photographers.

Closer to the core of the web, open source software initiatives like Linux and Apache and others hosted by SourceForge provide the technical know-how that keeps the engines turning (and prevents private enterprise from consuming the commune).

Communal forums like LinuxWorld, SlashDot and WebMasterWorld provide the discussion and descriptive glosses that make it possible for all the moving parts of the web's technology to work together.

Taxonomies like the Open Directory Project (ODP) provide structure to search engines like Google and Yahoo. (The ODP is not really communal, but it is noncommercial, provides its data to everyone, and works because of the efforts of volunteer editors.)

Wikis are communal by definition. Commune-based wikis, particularly the Wikipedia, provide information repositories that are unmatched in scope (and in the number of contributors) while avoiding any kind of hierarchical information verification.

Everything, however, is not perfect in this paradise of communes. The major problems with information communes are that they are easy to manipulate or corrupt, and that it is hard to evaluate the reliability of the information contained in communal repositories.

These are not new accusations to hurl at demotic levelers of information barriers. No doubt the priests who could write elegant Latin said much the same kinds of things when Gutenberg produced his first printing press. But they are troubling all the same.

It is easy to manipulate ODP listings and Wikipedia articles to improve natural search engine rankings, and these are standard techniques in the SEO (Search Engine Optimization) toolbox. When the stakes become large enough, anything is corruptible, and there have been serious claims that ODP listings are paid for with bribes (because they can be used to enhance a website's status in search engines such as Google).

More interesting philosophically is the accuracy of information found in communal repositories. A recent Op-Ed piece in the New York Times poses this question, asking how does the accuracy of the Wikipedia compare with the accuracy of the information found in a vetted publication such as the Encyclopedia Britannica?

The answer, of course, is that comparative accuracy depends on many variables.

Before I discuss some of these variables, I need to point out that the wise consumer regards all received information with some skepticism, suspecting that the inherent bias of the purveyor may well color what is presented even if the bias is not intentional. (Here's some more information about how to evaluate the credibility of web pages.)

It's true that the situation is probably more extreme on the web than off it, but information bias is a universal. Even casual researchers need to understand some of the techniques used to evaluate the veracity of information found on and off the web: context, consistency, professionalism of presentation, plausibility, the reputation of the information provider, the verification process (if any), and the apparent motivation of the information provider.

In my opinion, there are manifest instances of information bias in the Big Red Barn and Dr. Seuss-and techniques for evaluating the veracity of information should be taught starting in first grade. Seriously. And it has some bearing on the situation, and is not entirely trivial, that I found the Wikipedia article about Dr. Seuss the best, most objective, and least commercial site to provide a link for more information about this children's book author (above).

Leaving first grade behind, would you rather read an article about elementary physics prepared by 1,000 anonymous members of the hive on Wikipedia, or one written by Albert Einstein for the Encyclopedia Britannica, and scrupulously edited by professionals?

This is kind of an apples and oranges comparison. I sure have more respect for Einstein's intellect than even for, well, a googol of anonymous intellects that have contributed to the hive. Einstein obviously will know his elementary physics, and furthermore will have insights to impart about how to think about physics.

On the other hand, the 1,000 hive contributors will get elementary physics right, too. By the time these folk have been back and forth over each other's work, the finished article is likely to be as accurate as something subject to the most rigorous professional review. There isn't likely to be much bias left after 1,000 people have been over it. There may not be much flavor either.

This highlights an important point: articles in the Wikipedia that cover a topic of fairly wide general interest are likely to be thorough and unbiased. But more narrowly-focused topics are often written by one or two people with an ax to grind, may be biased, and may contain faulty information (or even outright fabrications).

The more technical and arcane you get, the more likely it is that there are only a handful of people who really understand the topic. This line of thinking implies that communal-process information mechanisms like the Wikipedia are less likely to produce good information on cutting-edge scholarly and scientific topics?and more likely to be good sources of information for topics at the general college level (and below).

Going back to Einstein, Einstein is not going to get his facts wrong, and will probably have an interesting viewpoint about physics (even elementary physics). But that self-same "interesting viewpoint" can also be called "bias." In fact, it's common for the very best scientists and researchers to be extremely opinionated in their areas of expertise (and a gosh darn good thing too!).

Jimmy Wales, the founder of Wikipedia, asserts that this issue is not about comparing the accuracy of information derived from a communal process with the accuracy of information from a lone distinguished, professional contributor. Rather, Wales opines, it's about the conflict between two vetting systems. The communal wiki process, according to Wales, involves unending scrutiny whereas a professional review process like that of the Encyclopedia Britannica is flimsy: in the future "people will say, 'This was written by one person? Then looked at by only two or three other people? How can I trust that process?'"

To its great credit, the Wikipedia has been open about problems with accuracy, methodology, fraud, and group process. In fact, these things are discussed ad nauseum as part of the Wikipedia (see the Wikipedia Community Portal for details).

Wales is right that from a philosophic viewpoint that the stakes are high, and that the process of individual signed contribution is on a collision path with anonymous communal information gathering. He's wrong to assume that the commune is always right.

[Thanks to Martin Davis and Phyllis Davis for reviewing this piece; the opinions and flaws, of course, are mine, all mine.]

Posted by Harold Davis at 10:20 AM

March 13, 2006

Buyer Beware, Indeed!

In a recent story, I noted that Craigslist is being sued to comply with the same Federal Fair Housing regulations that apply to newspaper classified ads.

According to a recent front page New York Times article, real estate transactions in which buyers have never seen the property have become increasingly common on the Internet, particularly on eBay.

It should come as no surprise that a great many of these sales are fraudulent. Any buyer of a house or land who does not at least view the property prior to buying it is an idiot (it is hard to put it any more kindly). That there are buyers foolish enough to proceed in this fashion is, in fact, a symptom of an overheated market in real estate (even with the slight cooling down in the housing market this year, things are still a bit frothy).

That said, these cases of outright fraud raise issues similar to the Craigslist lawsuit. eBay, and others, simply cannot go on abdicating responsibility for online transactions consummated on their sites. Ultimately, there will be a backlash.

If you stick to rummage sale items, it's reasonable to assume the same level of responsibility that you would find at a yard sale: once you walk away with your purchase there is no recourse.

But real estate is not a dusty tchotchke from your parent's attic. High ticket items lead to real damage, and to legal recourse.

Furthermore, most jurisdictions provide fairly thorough legal protections for buyers in real estate transactions in the off-line world. Both from a moral and a legal viewpoint, eBay will have to find a way to incorporate these protections into its workflow when real estate is involvedor surrender this potential portion of the online transactions market to the fraudsters.

Posted by Harold Davis at 9:23 AM


Google
 
Web www.braintique.com
www.digitalfieldguide.com www.googleplexblog.com


Home | Barticles | Blogs | Books | Services | FAQ | Contact

Braintique.com. All rights reserved.

Recent Entries

Archives
Search



Notifications - Add your email
Archives

Search Engine Optimization





RSS 2.0 Syndication feed

Syndication Viewer

Our Web host:
IX WebHosting





Food for Your Brain! Get a Barticle! Questions Answered Books for You What We Can Do For You Contact Us Brain Food Questions Answered Books for You What We Can Do For You Frequently Asked Questions About Us Google Research Photoshop Wi-Fi and Wireless Networking The Natural Way to Write