What is the point of this gibberish spam to my web site?

CookingWithGas · August 4, 2024, 12:06pm

What possible purpose is there to web spam that is just nonsense content?

I have a web site for my jazz trio with a Contact Us page. It takes the user’s message and emails it to me. I manage to filter out most of the spam (I have not set up CAPTCHA yet) but sometimes something gets through and it’s usually just some random characters. Here’s the latest. The actual message is about 20 times the size of this sample but it just goes on like this. The user gave a Gmail address as their address. I’m assuming it was generated by a bot. The ones I filter out are those that have hyperlinks in them but at least those were intelligible.

Joshuazew has sent you the following message:

Chronos · August 4, 2024, 12:16pm

I’m guessing that that’s some non-Latin script (possibly Chinese) that got garbled somewhere (most likely, by the bytes being offset by one). Old text encoding using ASCII only needed one byte per letter, but it could only be used for the Latin alphabet (plus a handful of other characters). Most text nowadays is Unicode, which needs two or more bytes per character, and so if those bytes get mismatched, you get garbled nonsense that doesn’t make sense in any language.

Whack-a-Mole · August 4, 2024, 12:22pm

Or Russian:

LSLGuy · August 4, 2024, 3:09pm

If anyone wants the technological details this is a good intro:

I would bet that the OPs website, or at least the contact page and the server-side code that creates and sends the email are all ancient and US English centric.

The info being posted by the spammer is probably well-formed modern Unicode Chinese or Russian. It’s the OP’s website that’s mangling it.

HeyHomie · August 4, 2024, 3:11pm

And that website is … ???

CookingWithGas · August 4, 2024, 3:38pm

Well, certainly it’s US centric. It’s not ancient but I am not a professional web developer, and this is intended for a local audience, not international. And even if I supported languages in other character sets the content would be useless to me.

The server-side is PHP and I am just using an HTML form to collect whatever they enter into the textbox and copying the content to the body of an email. Offhand I don’t know which of those steps are hostile to other character sets.

It just occurred to me to View Source and this is what I get:

Joshuazew has sent you the following message:

Цены на монтаж водонагревателя Стоимость услуг VENCON оправдана качеством и высоким уровнем обслуживания наших Клиентов. Стандартный монтаж водонагревателя Вид работ Цена услуги Цена при покупке товара у нас Установка бойлера от 5 до 35 литров 1 999 грн 1 399 грн Установка бойлера от 40 до 80 литров 1 999 грн 1 399 грн Установка бойлера от 100 до 160 литров 1 999 грн 1 399 грн Установка бойлера от 200 до 300 литров 1 999 грн 1 399 грн Замена бойлера 2 999 грн 2 099 грн Демонтаж бойлера 999 грн 999 грн Скачать полный прайс материалов и работ Стандартная установка включает следующие работы Согласование даты и времени монтажа с менеджером. Выезд нашего специалиста на

and so on. Google Translate tells me it’s a Russian promo for water heater installation.

Jackmannii · August 5, 2024, 6:05pm

It sounds like a great deal, but how long does it take from Magnitogorsk to respond to service calls?

Cugel · August 6, 2024, 10:10am

I also have a web form, just the simplest of challenges e.g. “What comes next 1,2,?” will do the trick.

Chronos · August 6, 2024, 11:55am

Super simple captchas like that will work if you’re the only one using it, and your site isn’t very popular. As soon as it’s worth it for anyone to make a bot for it, though, it becomes completely ineffective.

DPRK · August 6, 2024, 12:12pm

That is true as far as it goes. However, there is a foolproof technique: if you add a simple checkbox certifying that “I am human”, no bot will ever be able to click it. It would violate the Laws of Robotics, or something.

Al128 · August 6, 2024, 12:15pm

(bolding mine)

… added bonus:

You could easily identify fellow SDMB members by nitpicking/bitching about non-determination quality of this string and that - prima facie - at least “3” and “4” seems to be reasonable good answers.

May I suggest a string like:

1-4-8-9-27-16-64-?

should separate the wheat from the chaff

Chronos · August 6, 2024, 12:17pm

There’s a lot more behind that checkbox than you think. It probably measures things like precisely how the mouse moves to the checkbox, and apparently it’s difficult for a bot to emulate a human in that way. And it’s never the only line of defense: It only gives the checkbox if it’s already pretty sure that you’re a human (from lots of prior human interactions from that computer), and if it’s unsure, it gives you one of the “click every box that contains a traffic light” ones.

markn_1 · August 6, 2024, 6:01pm

That’s easy: 4.
It’s the powers of 2.

Oh, no, sorry, it’s 5.
It’s the Catalan numbers.

No, wait, it’s 10.
It’s the integers in base 3.

Wait, what I meant was 12.
It’s the superfactorials.

Well, one of those must be correct.

Coriolanus · August 6, 2024, 8:24pm

On several of my websites I use a function that counts the number of non-latin characters and is set to flag as spam anything over about a third as spam. I guess you could post up to 1024 letters yet it won’t even store more than that in the DB - just mark it as spam if it’s bigger.

CookingWithGas · August 6, 2024, 9:16pm

I have a simple challenge, “What type of music do we play”? A surprising number of spammers know that it’s jazz.

Cugel · August 6, 2024, 9:25pm

Well, if it’s an actual human repeatedly filling out the form, that’s a different beast all together.

Cugel · August 6, 2024, 9:27pm

“Making a bot” for one form seems labour intensive, especially when simply changing the question renders it useless.

DPRK · August 6, 2024, 10:32pm

Bots do know how to read what the question says. You have to give them that.

CookingWithGas · August 7, 2024, 3:05am

Reading the question and knowing how to answer it are two different things…I am investigating CAPTCHA currently. It’s a free service but there are multiple ways to implement it and I’m trying to digest the documentation.

Cugel · August 7, 2024, 4:49am

First it has to know there is a question, then divine what the question is, and lastly know the answer. IMO, all that is happening here is some crawling of the web searching for the < form > tag, then auto-submitting with their crap inserted. There’s no analysing the content, just the usual scattergun approach. Which is easily defeated, my experience is 100% effective since I’ve put my question in place.

Topic		Replies	Views
Image verification fields anger me The BBQ Pit	41	4457	December 17, 2008
What's with the gibberish guys on usenet? Factual Questions	24	4015	October 22, 2001
Gibberish Spam. (why?) Factual Questions	15	1063	June 2, 2003
Spambots hitting the SDMB? About This Message Board	27	1892	January 14, 2007
How do spambots get past "human-proving" tests? Factual Questions	22	1911	June 13, 2006

What is the point of this gibberish spam to my web site?

Related topics