Random Russian website copies my forum: WTF?

Despite being out of my depth with regards to the technical side of things, I run a forum on the Montreal metro at www.metrodemontreal.com .

I just came across another website reproducing some or all of the posts on the forum: http://xkdv.pp.ru/ .

  1. Why would they do this,
  2. How did they do this,
  3. Is it dangerous,
  4. How do I make them stop?

I understand the SDMB underwent something like this some time ago. Any information would be appreciated.

Can’t help, sorry, matt, but here’s a link to stuff the SDMB and Straight Dope went through, as you said.

Basically, they’re trying to make the forum look active so people will register, and get their e-mails harvested for spam.

Now the weird thing is, this doesn’t appear to be an actual working forum, just some sort of weird repository for the texts of the posts.

Good news: It likely isn’t dangerous. The Russian site is just having a program called a ‘spider’ look at every page of your forum and copy the contents into their forum. It’s no worse than a really eager bunch of people exploring your pages.

Bad news: Stopping them legally would be difficult at best. Russia doesn’t play by the same copyright rules America and Canada do and even the big organizations in the field are having a hard time getting them to go along.

Possibly good news: Stopping them technically shouldn’t be difficult unless they’re being very clever. Look through your web server’s logs and find likely domains the spider could be operating out of, then configure your server to deny access to anyone coming from that domain (or, at your option, serve them up a page full of rude images ;)). Contact your local bit gods if you don’t have access to the server or don’t feel comfortable changing its configuration.

Snopes has fairly recently disabled any copying of their pages (I know this because I used to quote small bits from the page along with the link). Don’t know how that works, but it might help stop the copying if you do something similar.

Snopes is using simple javascript to get around the copy. Simply disable Javascript and you can copy it.

If you don’t want to go to that trouble do this. The javascript on snopes simply disables the mouse.

What you do to get around it is this:

Let’s say you want to copy a section that starts…

“Now is the time for every good man to come to the aid of his party.”

All you do is hit the keys “CTRL+F”

This will get you a find

Then you type in a word or two (in this case “Now is the time”) then hit enter

The words you typed in will be highlighted

Then hit the keys "Shift+Arrow right (or “Arrow up” or “arrow down” or “arrow left”) and that will highlight the text you want to copy.

Then after you highlighted what you want to copy simply hit the keys “CTRL+C” to copy the text then “CTRL+V” to paste the text wherever you want it.

It’s pretty easy to get around anything on the web

Do you have any idea why they would like to do this?

As I said, it’s not even a forum, it’s just a bunch of static pages as far as I can see.

I don’t know why it’s happening, but on a related note, I can say that you might want to check for a security hole in your board. I know for a fact that you have a dormant Admin who should be removed, especially as they (meaning, myself) seem to now be getting board Admin messages sent to their e-mail account. I don’t know if that’s related to the recent copying, but in any event you might want to check all the Admin accounts are people who should be Admins.

The chances of them using a browser to scrape that much content approach the Planck scale. Most likely it’s just being done to get them $$$ from serving ads. I note that some are Google based; at the very least you could contact Google and see if they’d cut them off.

Depending on how much configuration / access you have on the box serving your web pages, you may or may not be able to block the perpetrators. Checking your log files might give you an IP address range they’re working from… this might allow you to drop packets with that source. Going this route means you either know how to do that already, or are willing to spend some time conversing with your ISP.

I agree with this. If you have access to log files, try to see if there’s anything that looks weird - like one IP accessing a bunch of your pages in a row, and very quickly (note - you should see Google, MSN and Yahoo spiders too). Use http://www.dnsstuff.com/ (the “IP Information” tool) to find out more info on said IP. Give your ISP a call if you can and explain that you need to have an IP blocked from accessing your site and why. They shouldn’t mind.

I don’t know. It isn’t listed by anyone as a known spam house. I can’t even find people bitching about the domain online.