After a less than auspicious start, I’ve been having a very thoughtful conversation with Jay Allen, who is spearheading a new effort to combat weblog spam (announcement). He apologized for the email I received, and his heart really is in the right place. But I still have grave concerns about the project.
Weblog spam is not the sort of thing that’s going to hit the front page of CNN anytime soon, but it has been a minor annoyance for some webloggers, and it has been my experience that these things can escalate from minor annoyance
to Usenet-level catastrophe
in record time. So.
The first thing that the anti-weblogging-spam advocates need to realize is that the spam landscape has changed. For all that we laugh at the spam that SpamAssassin catches, ha ha aren’t those spammers stupid, I can’t believe anybody falls for that… yes, the messages are stupid, but spam works, and spammers aren’t as stupid as you think. Spam works and it is big business, and spammers are increasingly organized and increasingly business-savvy. It’s not some guy in the garage who bought a CD of email addresses from MicroWarehouse (yes, they used to sell them, I have old MacWarehouse catalogs to prove it) who thought it would be cool
to tell a million people about his Beanie Baby collection. It’s organized crime rings who hire programmers to automate everything they possibly can (domain registration, ISP registration, free email account registration) and hire menial workers for pennies an hour halfway around the world to do all the manual things they can’t automate (like get past image-based login systems). They hire virus writers to write extremely sophisticated viruses that exploit all known holes in everything, install spyware, malware, adware, and remote control programs with which they can both send more spam and launch distributed denial-of-service attacks… against anti-spam advocates.
Which brings me to the second thing. THIS IS NOT A HOBBY. If you want to be an anti-spam advocate, if you want to write software or maintain a list or provide a service that identifies spam or blocks spam or targets spam in any way, you will be attacked. You will be attacked by professionals who have more money than you, more resources than you, better programmers than you, and no scruples at all. They want to make money, this is how they have decided to make money, they really can make a lot of money, and you’re getting in their way.
This is old hat to anyone who’s been involved in anti-spam efforts in other domains (Usenet and email spring to mind), but just like everything else, the weblogging community seems intent on (a) thinking they’re special and unique and nobody has ever had their problems before, and proceeding to (b) ignore all the work that has come before and reinventing the wheel.
Now, certainly some adaptation of code and algorithms will be necessary. Existing tools probably can’t be used as-is. Email spam fighting relies a lot on the structure of an email, the chain of headers that give away so much information to the trained eye, and none of that information is available in weblog spam. But I see from Jay’s Comment Spam Clearinghouse that the latest and greatest tool available to us is a master list of domain names and a few regular expressions. No offense to Jay or all the people who have contributed to the list so far, but how quaint! I mean really. Savor this moment, folks. You can tell your children stories of how, back in the early days of weblogging, you could print out the entire spam blacklist on a single sheet of paper. Maybe with two or three columns and a smallish font, but still. Boy, those were the days.
And they won’t last. They absolutely won’t last. They won’t last a month. The domain list will grow so unwieldy so quickly, you won’t know what hit you. It’ll get so big that it will take real bandwidth just to host it. Keeping it a free download will make you go broke. Code is free, but bandwidth never will be. Do you have a business plan? You’ll need one within 6 months.
And then people will start complaining because a regex matches their site. Or spammers will set up fake identities to report real sites and try to poison the list. Are you manually screening new contributions? That won’t scale. Are you not manually screening new contributions? That won’t work either. Weighing contributions with a distributed Whuffie system? Yeah, that’s possible, but it’s a tricky balance, and still open to manipulation.
And then the spammers will strike back. They’ll complain to your ISP that you’re spamming, and your ISP will kick you off without so much as a by-your-leave. They’ll hire lawyers and bring you down with bogus DMCA injunctions. They’ll own a million Windows boxes and direct them all at your server. They’ll track you down, find your social security number and date of birth, steal your identity, and ruin you personally. Just to distract you temporarily.
It’s all been done. It’s all been done before, and it was completely all-consuming, and it still didn’t work. Spammers register dozens of new domains each day; you can’t possibly keep up with them. They’re bigger and smarter and faster than you. It’s an arms race, and you’ll lose, and along the way there will be casualties, massive casualties as innocent bystanders start getting blacklisted. (You do have a process for people to object to their inclusion, right? Yeah, except the spammers will abuse that too.)
And that will lead to the backlash. People who object to the management of the list. People who object to the submission process. People who object just out of principle. People who contributed a domain to the list and it didn’t get included and now they’re pissed. People who spin wild conspiracy theories about how you’re in league with a certain group of spammers who are secretly using this list to drive out their competition.
It’s a full-time job, and everyone will hate you, and it still won’t work. Spammers are smart and determined, and people are numerous and stupid, and spam pays. You can’t make it not pay. Going after their ISPs won’t help; they’ll auto-register somewhere else. (Already happening.) Going after their upstream provider won’t help; they’ll cut deals with the backbone providers and keep going. (Already happening.) Going after them in court won’t help; they’re already living under friendly governments. (Already happening.) You can’t stop them with Turing tests; they’ll hire child workers to read your images and manually register/post/ping/trackback/whatever. (Already happening.) Then they’ll attack you with the power of 100 million owned Windows boxes and knock you off the Internet. (Already happening.) They will keep coming and coming and coming until you give up, go home, cry uncle, take Prozac, get a regular day job to replace the one you quit when being an anti-spammer became your full-time job.
Weblogs may turn out to be The Next Thing for spammers, the next vector to exploit. And if that’s true, then things are going to get really ugly really quickly. If you’re up for that fight, then take them on, Godspeed. But prepare yourself for the worst, and then imagine something worse than that, and then accept that your imagination is too limited, because it will be so much worse than that.
Update: as I was posting this, I received a response from Jay Allen that addresses some of my concerns. At the very least, he seems well-versed in the challenges that lie ahead for him, and he is going into this with guns blazing and with no misconceptions about how difficult it will be. He has some fantastic ideas in store, and I look forward to seeing them in action.
I’ve also gotten several other responses to this piece. I have been called a variety of things, including pessimistic, cynical, and a naysayer with no positive solutions to offer. All of these things are true, and none of them negate the basic fact that we have collectively been living in a fantasy world where the slimy kudzu that covers the rest of the hinternet has somehow escaped us. Someone challenged me, Well, how am I supposed to continue hosting these low-barrier discussions?
I’m sorry, but I don’t know. To quote Bruce Schneier, I feel rather like the physicist who just explained relativity to a group of would-be interstellar travelers, only to be asked, ‘How do you expect us to get to the stars, then?’ I’m sorry, but I don’t know that, either.
The low barrier is exactly the problem here. We got away with it (please, come post random links on my site which is well indexed, poorly managed, and open to unlimited anonymous contributions!) because we were collectively very young and naive and thought no one could hurt us. Now it’s like we’re turning 30 and being told we need to go on a diet and asking, Well when can I go back to my old eating habits?
Um, you can’t. Your old eating habits don’t work anymore.
Weblogging is growing up. Oh wait, you thought that would be a good thing? You must still be young.
§
I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)
§
© 2001–9 Mark Pilgrim