in Search
Welcome to Neopoleon - Sign in | Join | Help
Navigation: Home | Forums | Galleries

Comment Spamming Bastard Wiener Bastards

I've been following some of the comment spam discussions going on around the ol' INTARWEB.

- The beautiful, sexy, and talented Scott Hanselman has posted a CAPTCHA solution for DasBlog that doesn't require a recompile (nice)

- Robert McLaws has gone a little nuts, but in a good way

Between the two of them, they've covered a lot of the obvious bases.

Scott has a solution that can get you in shape quickly (provided you're running DasBlog), and Robert has a solution that could potentially lead to resolving issues for a much larger population of bloggers.

I've thought about both of these methods. A couple months ago, I envisioned a centralized system with a platform agnostic web services interface (yes, to the WS inexperienced: you bet your little tushes that there are platform dependent WS APIs out there - maybe not in a strictly technical sense, but in a "real world" way) that would create a simple and easy to use membership system that would have some built-in anti-spam measures.

Problems?

Resources, for one, and I'm not talking about people. A centralized solution would require servers, hosting, coding, and so on. In other words, it would require moolah, which is something that I don't exactly have in spades right now.

Robert might succeed here because he definitely has more resources to work with, and I hope he gets something going.

Another issue is adoption. Unless you publish an extremely simple spec, nobody's going to adopt it. Plus, your paranoid types are going to treat a centralized solution with a lack of trust ("So, I sign up for this free service, give this guy all my membership info, rely on him for spam filtering, and then, six months into it, he decides to start charging me - screw that!").

I think this is the best solution for defeating comment spam (aside from hanging the spammers by their intestines in public places), but also the hardest to implement.

Then, on the other side of the fence is Scott's solution (which has been implemented by many bloggers many times over).

I think CAPTCHA is great solution, but also a great inconvenience. I'd rather sign up once for a service like Robert's and then never have to worry about it. When I write comments, I don't want to have to copy some computer's LSD dream rendering of a string of letters and digits.

That said, it's something that will work, and it will work now.

There are other solutions, though.

A lame solution that also works now

When I thought about the effort required to implement a centralized solution, get the blogosphere to back it, get help to code it, and get the resources to host it, I decided instead to go read a good book and then take a long bath. I already have, like, three careers or something, and I don't need another.

When I thought about implementing a straightforward CAPTCHA solution, I decided against it because I want, first and foremost, for commenting to be easy. Without comments, blogging is, like, stupid and dumb. This is all about arguing, agreeing, insulting, and patting each other on the back. I don't want to do anything to discourage comments.

So, here's what I did:

- Created an Outlook folder called "Blog Comments"

- Created an Outlook folder called "Blog Spam"

- Put together an Outlook rule that routes all comments to the "Blog Comments" folder

- I go through manually (yes: manually) and determine which comments in the "Blog Comments" folder are spam

- Any comment spam that I find, I drag to the "Blog Spam" folder (this is easier than it sounds - spammers nail your blog in sweeps, and you can often just drag ten or more emails at once into the folder)

- Once I've separated the spam from the real comments, I run an app that I wrote which iterates through all the comments in the "Blog Spam" folder, parses them for their .Text IDs, hooks into one of the .Text web service APIs, and then calls the method that deletes the entry associated with the parsed ID - at the end, the app goes through and deletes all the entries in the "Blog Spam" folder so that I never have to see their ugly little mugs again

Sound complicated? Yeah, sure. It's not easy, but it beats doing the whole thing manually, and the truth is that I'm so used to it now that I can delete about 50+ spam comments in less than a couple minutes, and that's worth it for me. I don't have to make commenting more inconvenient, and I don't have to wait for a centralized solution.

However...

I'm still working on a CAPTCHA solution, but it's not like your typical CAPTCHA system. I've been watching the spammers and getting a feel for how they do things. I think I can put together a system that will block the spammers 99% of the time, and only inconvenience the commentors about 1% of the time.

I made those stats up, by the way. I don't have any idea about just how often the thing will really work.

But it will work.

And, when it's working, I'll 'splain it to y'alls. It's not complicated. Quite stupid, actually.

But so are the spammers.

For now.

The Strongest Solution

In the end, I think the best approach is to:

- Get a system like Robert's up and running

- Implement different measures on our own

Having a centralized solution will take care of large quantities of spam, but the solution itself is, ultimately, reactive instead of proactive, which means that spam will still get through. Somebody has to get spammed before an IP can get blocked.

For those situations, I think it's best to have a personal system. Scott's using CAPTCHA, and I'm using Outlook/C#. In working like this, we're throwing multiple problems to the spammers. My CAPTCHA system, for example, isn't going to output the same peyote colors that Scott's will, and it won't output in the same manner.

If we have many per-blog solutions that can be plugged in relatively easily by bloggers, then we're throwing the spammers more than they can handle.

At the very least, we can keep them busy by forcing them to constantly add ridiculous IF statements to their bloody, stinking, vile, stinking, reprehensible, stinking code...

...until the day that they're hanging in public squares by their gollywots.

Published Monday, November 15, 2004 9:04 PM by Rory

Filed Under:

Comments

 

Haacked said:

From my experience, it appears that alot of comment spam isn't necessarily from an automated source, thus rendering the CAPTCHA solution inert.

However, I think a majority of the spam is coming from people who find your site via Google. One option (though extreme) is to disable comments except for CommentAPI.
http://haacked.com/archive/2004/06/05/530.aspx
November 15, 2004 9:27 PM
 

Scott said:

Why not combine expand Roberts service to include support for the MT-Blacklist? or create a .Text/dasBlog exportable blacklist that other people can import?
November 15, 2004 9:29 PM
 

Haacked said:

Another option is to not have comments indexed by Google.
http://haacked.com/archive/2004/07/02/768.aspx
November 15, 2004 9:29 PM
 

Eric said:

Unfortunately, I don't think most spammers check to see whether your comments are indexed (as opposed to whether your entries are) before spamming. Not being indexed would cut down on their benefit, but might not affect the incoming flow (or the resulting readability of comment threads).

I really think MT-Blacklist is the best of breed at the moment. I don't think it's missed a single spam since I installed it (way back before MT 3.0). It's based on some simple rules (comments to old posts and comments with lots of URLs are likely spam) and regex filters, instead of Bayesian or some other solution, but it does appear to be effective.

One of the ways I find it to be *most* effective is that it does let through most "good" comments, while stopping bad comments. Rory's solution unfortunately sounds like it lets the spam through for whatever time period it takes for him to run his process. I'm entirely too anal for that to work for me. :)
November 15, 2004 10:00 PM
 

George said:

Why wouldn't forcing people to sign up to post comments to a particular blog work? Seems like that would cut out pretty much all your blog spam, or at least give you an excellent filter to cut out those blog spammers that happen to create an account. Do people hate signing in that badly? What am I missing? Am I over simplifying what must be a super complicated topic that is way over my technology inferior mind?
November 15, 2004 10:32 PM
 

Firefox User said:

Oh, and on a side note. Why the heck do your comments disappear when the mouse hovers over them in Firefox? It's really annoying. Is it a Microsoft scheme to make blogging more annoying for Firefox users? Just curious.
November 15, 2004 10:35 PM
 

miles archer said:

Will Baynesian filtering work? I think that's what Joelonsoftware does on his site.
November 15, 2004 11:30 PM
 

Geoff Taylor said:

Why not put some weight behind something like the Liberty Alliance (http://www.projectliberty.org/)? That'd allow single signon for across all the blogs that support it. In conjunction with the single signon, all we'd need then would be two further mechanisms:

1) A way of marking a user as a spammer, and
2) A way of propagating spammer IDs to blogs.

The second point would mean that if someone puts 200 spams in comments before being marked, once they are caught blogs can then automatically remove comments from that ID.

If we did this, though, how would we stop spammers just signing up for lots of accounts?
November 16, 2004 9:53 AM
 

John said:

I don't like where this is going.

I don't see why when I post to neopoleon.com that you should turn around and report that fact, along with my IP address (which is static) to some uber database that some 'business man' can mine at his leisure. I like the idea that I could lie about my URL, or leave it out if I wanted, and only have to trust the one web site not to deference a URL or cookie to witch-hunt me.

Squiggly obfuscated graphics, pattern matching filters, etc. are a much better idea, and less oppressive (in the sense that power is decentralized). Plus they don't introduce another networked dependancy, which will impact your responsiveness and availability.

I know it's futile. Just wanted my objection noted. Perverts.
November 16, 2004 1:48 PM
 

George Clingerman said:

Of course, some people might be like me and not mind the blog spam. I kind of treat it like I treat telemarketers. I get really excited because I'm finally getting a phone call from someone! So, when I check the comments to one of my posts and I see 3 new comments, it's really exciting!! Plus I find the randomly generated comments like "8199 ya know eredclips" by some guy called "debt consolidation" just fascination and riddled with a deeper meaning. Am I the only one that feels this way?
November 16, 2004 5:05 PM
 

Kevin Daly said:

I've been thinking about this problem (having just switched comments off again on my own blog, probably for the last time), and we basically have 2 varieties of comment spam:
1) The random moron unknowingly proclaiming to the world "I am a dickhead. Ha ha ha".
2) The now far more common industrialised spam, obviously commissioned by crappy commercial interests. Aside from the nature of the associated links, these are obvious by the fact that you typically get half a page of links instead of just one. The spammers like to bless us with their whole portfolio in one shot.
Morons will always be with us, but commercial spam is overwhelmingly targeting Google and trying to steal some unearned link love. It strikes me therefore that there is scope for cooperation between Google and the writers of Blog engines - a simple item of metadata added to a comment link as a matter of course could be used to mark it as something that Google should ignore in all cases.
While harder to administer, I'd also favour a black list so that those who commission comment spammers would find the urls they used permanently excluded from page ranking (of course they'd get new ones, but that would cost them money and be inconvenient, so just maybe they'd eventually get the hint). I know that has lots of problems though so I'd be interested in people's thoughts on the first suggestion.
November 16, 2004 6:46 PM
 

Geoff Taylor said:

I understand your concerns, John, but if comment spam is becoming a big problem the question is:

Would you prefer to be able to lie about your name or URL in fewer and fewer blogs, as more and more blogs switch off comments, or would you prefer to give up that anonymity (to some extent) in order to continue posting to a majority of blogs.

If comment spam is a big enough problem (and since I don't have comments on my blog I can't know), I don't see the current anonymous posting continuing.

Geoff
November 17, 2004 9:20 AM
 

John said:

Geoff,

I don't see it as an "either, or" proposition.

There are heaps of things you can do to stop comment spam that can be administered from *your network* by *you*.

I just think people are being sold a dummy based on FUD. And the people offering 'black list services' etc, will actually have a commercial interest in the data they collect, beyond simply administering the blacklist (which is bad enough), although they're unlikely to be open about that fact. Also, IP address based blacklists are stupid, considering how many users can be behind a single address.

You can moderate spam as it comes through with a single "yes/no", you can show random images with obfuscated text, you can have smart filters, etc.

If I was investing in technology to manage comment spamming I'd be looking to add better UIs and management tools for users / administrators. Not big collaborative blacklist databases. E.g. URLs to email messages that the site administrator could use for single-click navigation to a page where they could approve or reject the message with minimal fuss, and features that you could easily switch on from your blog software, such as the obfuscated images, pattern matching filters, etc.
November 18, 2004 5:22 AM
 

TrackBack said:

Comment Spamming Bastard Wiener Bastards !!
January 24, 2005 10:32 PM
 

TrackBack said:

Comment Spamming Bastard Wiener Bastards !!
January 24, 2005 10:34 PM
New Comments to this post are disabled

About Rory

I *own* this site, you loser.