Sunday, August 30, 2009

The evolution of comment spam - from parasite to symbiote?

Lately I've been getting blog comments that blur the spam/non-spam species boundary.

Comment spam used to be pretty clear. It would be unrelated to the post topic, and contained a link to a splog or other more or less fraudulent web page. These were easy to automatically block, so spammers dropped the links. Second generation comment spam aimed for search engine "optimization" through reputation enhancing back links to the author URL. Second generation comment spam was made of strings like "thanks for the the great post"

These were harder to machine reject, but easy for human reviewers to spot.

Now I'm seeing third generation comment spam. These have no links, and they're actually related to the original post. Sometimes they're almost non-sequiturs, but mostly they read like a fourth grade student answering a homework assignment. The grammar suggests either a very young or non-english writer. They do link back to splogs.

So how's the new species of comment spam being authored? It could be AI based -- maybe calling Wolfram Alpha or Wikipedia to retrieve relevant strings. It's probably human though -- outsourced work being done by low paid labor churning out comments at high speed.

This third generation spam isn't trivial to reject. Sometimes I have to think about it.

We know where this is going. Fourth generation spam comments will actually make sense. They'll be legitimate comments.

Fifth Generation spam comments will be very high quality. Skynet will appreciate them.

Update 9/4/09: Another (funny) take on the theme. Also, see the comment by one of my favorite writers.

Update 1/1/10: Cory Doctorow's excellent 2006 novella I, Row-boat (read it, it's online) tells us how Robbie the row-boat's ancestors became sentient ...
“Back in the net’s prehistory it was mostly universities online, and every September a new cohort of students would come online and make all those noob mistakes. Then this commercial service full of noobs called AOL interconnected with the net and all its users came online at once, faster than the net could absorb them, and they called it Perpetual September.”...

... “AOL is the origin of intelligence?” She laughed, and he couldn’t tell if she thought he was funny or stupid. He wished she would act more like he remembered people acting. Her body-language was no more readable than her facial expressions.

“Spam-filters, actually. Once they became self-modifying, spam-filters and spam-bots got into a war to see which could act more human, and since their failures invoked a human judgement about whether their material were convincingly human, it was like a trillion Turing-tests from which they could learn. From there came the first machine-intelligence algorithms, and then my kind...

2 comments:

Charlie Stross said...

You've just stumbled across the McGuffin behind the plot of the novel I'm currently writing ...

JGF said...

So if you're doing book signings in Minneapolis can I get you to sign my copy? :-)

I'm looking forward to the new novel.

Incidentally, I've read all of your recent science fiction but, as I write this, I have 'the hidden family' in front of me. I'd missed out on your fantasy work and am only now going through the Merchant Princes series. Terrific work.

You are insanely productive!