Wednesday, January 13, 2010

Innovations in comment spam

Comment spam continues its rapid evolution. Despite my reluctant surrender to the Captcha I'm seeing novel mutations every few months.

A recent technique is to write a reasonably detailed comment about a fairly specific topic, like "junk DNA". A query engine then identifies all blog posts that have a high match to the comment. An automated posting process, perhaps with some tool-assisted human powered captcha processors (via Amazon's Mechanical Turk?), submits the post to thousands of blogs.

Even with human review, the comment submissions will be a good quality match to a meaningful number of blog posts. The comment gets posted, and the spammers get something of value (link referrals?).

The one I rejected today was clumsily written, so it was fairly easy to spot. It contained an unnecessarily specific reference to a "first post", the author name was a marketing phrase, and the grammar and phrasing could have been better. I've probably missed better ones!

We can expect rapid improvement. In time they might evolve to transiently novel insights statistically applied to the right spot at the right time. At that point, would we not welcome them?

In the meantime we do need Google to start filtering these comments the same way they filter email. This particular approach lends itself to statistical filters, and of course the use of author reputation in filtering algorithms. Alas, Google has forgotten all about poor Blogger ...
My Google Reader Shared items (feed)

No comments: