Why can't on-line news sites filter spam comments?

Grestarian · February 21, 2014, 6:47pm

I have a question of fact and, since I’m posting it on a subsidiary of the Chicago Tribune Newspaper, I thought I’d invite response from the Editor-in-Chief or whoever is delegated the role of on-line Comments Editor or OnLine Technologies Programmer or whoever is in the decision-making capacity most closely related to this matter.

A news site puts up (publishes?) an article for viewing. It’s not controversial, not momentous, salacious, heartwarming, groundbreaking, maybe not even a current events issue; not much of a big deal – something like “Unpublished manuscript from Edgar Allan Poe discovered in Baltimore Attic.”

There’s really nothing much to say. Nevertheless, the first – and often the only – comment posted is someone using the space to say “My step-father’s mistress’ third cousine’s aunt-in-law makes $16billion a week on-line. Follow this link to learn how: www.iamasleazyspammer.biz”

They’re not even contributing to a discussion of the article. I mean, they’re not all that easy to detect when they’re buried within the discussions of a hot topic, but they should be easy enough to weed out of a mundane info-leak. In fact, those mundane info-leaks where they show up as the sole faux comment could really be the harvesting grounds for name-gathering software to find the spam-commenters. Why, oh why aren’t these guys put on a shared blacklist? Why aren’t these posts detected and deleted within minutes (at least within hours) of being posted?

The bottom of this news article includes a perfect example of the kind of garbage I’m talking about: http://www.rr.com/articles/2014/02/20/g/gov-t-looking-into-atf-operations-in-4-cities

–G!

Musicat · February 21, 2014, 6:55pm

I doubt if there is a hard, fast and factual answer. We know the technology exists but we also know the filtering software is far from perfect. The best solution right now is to use humans, as SDMB does in the guise of moderators, to filter and police. I can only assume that the paper doesn’t think that the labor cost would justify the rewards; papers aren’t exactly flush with cash right now.

I police my own YouTube comment areas, and wipe out the worst spam as soon as I can, which takes time. Some YT posters have decided to disable comments entirely for just this reason, but the paper may think that’s too harsh and user-unfriendly.

Czarcasm · February 21, 2014, 7:00pm

It looks like your news article does have such a filter, because there don’t seem to be any comments at all.

Musicat · February 21, 2014, 7:04pm

Which could be due to a human, not automated filter for all we know.

dzeiger · February 21, 2014, 7:41pm

Having an automated system try to filter things is much tougher than having a human look over things and see the “obvious” spam. I mean, I certainly know that breast cancer awareness websites are not porn, but we constantly hear about automated porn filters that have troubles telling the difference.

Spammers are not stupid. They know exactly what automated detection methods are currently being used, and construct their spam carefully to avoid them.

For example, back in the old Usenet days, anti-spam filters started by stopping lots of multiple posts with identical content. So then the spammers started posting messages that had the first couple lines with their ad, then a block of random characters. So the filters adjusted to find that, but then the spammers changed so that their block of text was just stuff pulled from other messages, or novels, or whatever. Some filters tried to adjust to just ignore everything by the first couple of lines, but that caught legitimate posts (for example, people posting short stories that always started with an identical copyright notice).

Which is not to say that all is lost, it’s just that it’s harder than it looks.

Human moderation is easier in terms of identifying spam, but that generally requires a report system and means that the spam is there until a moderator can take action, which means users will see it for a while, even if it does eventually get caught and deleted.

tellyworth · February 21, 2014, 9:17pm

They do. They successfully filter lots of spam, and not just the obvious stuff.

Spammers know this, and they get feedback whenever they see their content failing to get through. So they change until it does. The spam that you see is, by definition, the stuff that they managed to slip past.

It looks obvious to you when that’s all you see. But in the context of all of the candidate comments that come through, most of which you don’t see, it’s much harder than you’d think to find those few needles in the haystack.

The particular spammer you’re referring to is among the worst of the worst. They employ (or more likely deceive) humans into posting the spam for them. Those humans use a variety of tools to multiply and alter their comments in ways that are specifically designed to make filtering (both automated and manual) difficult. And those ways change frequently.

BigT · February 21, 2014, 9:27pm

In my experience, it’s because a lot of them don’t have a way to report a spam post. So any spam that gets through any possible spam filters just sits there. It’s not like moderators want to have to go back and visit the articles over and over.

The ones with the least spam actually close comments after a certain amount of time, so any moderators can move on.

tellyworth · February 21, 2014, 10:36pm

Moderators don’t usually examine comments one article at a time. They see the stream of comments coming in, regardless of what articles those comments were posted on.

“Report as spam” buttons and upvote systems have their own problems. People spam those buttons on comments they disagree with. Spammers upvote their own comments. And it’s harder to filter and moderate button clicks than it is to filter comments.

Mr_Downtown · February 22, 2014, 4:34am

One useful fact would be that the Chicago Tribune does not own the Chicago Sun-Times.

Topic		Replies	Views
Web site advice - Dealing with comment spam In My Humble Opinion	9	1105	November 8, 2007
Hell's bells, these blog spammers are fast! Miscellaneous and Personal Stuff I Must Share	7	956	October 23, 2005
Spammers beaten into submission? Miscellaneous and Personal Stuff I Must Share	1	947	May 17, 2009
who posts spam messages to comment sections Factual Questions	5	1258	December 23, 2009
Interesting - The LA Times, NY Times and the Washington Post all have view limited paywalls now. Miscellaneous and Personal Stuff I Must Share	10	3029	October 27, 2013

Why can't on-line news sites filter spam comments?

Related topics