Hidden source web pages

If it’s that sensitive, it shouldn’t be on the net. Playing tricks to make it hard to copy won’t protect you if you publish something that shouldn’t be public in the first place.

The thing is, all I’d have to do is hit the Print Screen button on my keyboard, and BAM, instant cut-n-paste of whatever I can see on my monitor (albeit as an image of the text instead of as the text itself).

I just don’t see any method of preventing copying as being at all useful against someone who actually wants to copy the data, and you don’t really need to worry about someone who doesn’t want to copy the data.

Just for information, the shield .gif looks something like this (I just tried a quickie test):



<body>
<div style="position:absolute; z-index:1">
Here is my text.<P>You cannot highlight it.
</div>
<div style="position:absolute; z-index:2">
<img src="transparent.gif">
</div>
</body>


transparent.gif is a 1600x1200 empty .gif. This is a relatively easy way to defeat casual cut and pasting without a lot of effort or work required to change your content. A determined user will get around it, of course, and somebody using an ancient pre-css browser might still be able to cut and paste anyway. You find this game being played by online newspapers sometimes.

I would think that in this example the barrier has already been crossed, by the information being visible in the first place. Whether someone can cut-and-paste is immaterial, since the information is already there; they can just write it down. I wouldn’t want this sort of information on a public web server at all, and no amount of jiggery-pokery would convince me otherwise. It should be on a secure server that allows access to only those that need it. About the only thing you have going for you in the example you give in the OP is the sheer volume of material you want to protect, but even here automation will make recovering the information trivial. “As difficult as possible” is, unfortunately, not very difficult at all.

This is what people are trying to say; you need to think about why you’re publishing this information on a globally available website at all, if you’re then having to go to such lengths to prevent it being accessed. Are there really certain people who you want to be able to read the site, but not be able to save it in any form? Or is it that there are in fact only a limited number of people who need access at all, but who can be trusted? If it’s the latter, then you should just be restricting access to the pages to just the relevant people, and if it’s the former, c’n’p restrictions aren’t going to be worth the amount of hassle they will involve. As quite a few people have said, you’re never going to make it more than a couple of hours’ work to get the text out in some form, no matter how hard you try. And in the process, you’re almost certainly going to make it deeply impractical for legitimate users to actually use. I would seriously hate to have to navigate a 1 million word site without being able to search for key words, and this is the sort of thing that you’re going to end up disabling. Is it really worth it?

Sometimes it really helps to step back and look at what you’re trying to achieve, rather than focusing too much on the first solution. I realise we don’t know exactly what you’re trying to do here, but it certainly sounds like there are far more elegant and practical ways of going about it.

As yabob pointed out, this can be defeated simply by using a browser which ignores style sheets. As with most copy protection schemes, this keeps honest people honest but does nothing to deter the people who are the real threat.

You don’t even have to ignore stylesheets. While I can’t use the mouse to highlight, the CSS-supporting Opera allows me to just hit “ctrl-a”, and all the text is selected as normal. I can’t do this in IE (the picture is selected instead), but I can still choose “view source” and have the entire text at my disposal.

Which is precisely how I get around it when I find it being used (for instance to clip a fair-use length snippet out of a news article to post it here). Hiding the HTML source would then be a separate question (the stock solution being the “load encoded stuff with a java applet or javascript” thing, as discussed).

I think we’re all in agreement that the efforts are rather futile, and the usability of the page degrades in proportion to the degree of “bomb proofing”. Turning your text into “not text” imposes huge usability problems, in addition to possibly being a hassle for production if you change your content often.

Technical solutions to enforce intellectual property rights often just get you into an “arms race”, where it becomes a game to see what you can come up with, and how somebody else can break it. Ultimately, the answer lies in legal and social constraints rather than technological ones.

Privacy issues are better handled by restrictions to ensure that people can only see data that they ought to be allowed to see, which ultimately has more to do with authentication and having a permission scheme on your server end than how you present the information. And having to trust that privileged users will be responsible if you tell them to be. Legal and social constraints again.

I suspect I could also just print the page to PDF then open the PDF in Acrobat and select it. Or print to text file for that matter (I have a print driver that does that).

If you have a link to a site that’s “protected” in this fashion I’ll try it out and let you know.

What about a Flash movie pulling data from XML?

If you download the movie to a HDD, it should be empty, as it can’t make the call to the XML from the user’s computer.

That’s basically the same as using a Java applet. It’s a speed bump but does nothing to actually secure your content. The only thing you gain using Flash instead of Java is that existing support in browsers might be a little wider. Or not. I haven’t looked at Flash stats lately.

The short answer to any “what about X” question on this topic is that if the text appears on my screen, I can copy it. The only way to keep me from copying it is to not show it to me. The long answer will differ for any specific method you suggest, but worst case I can OCR screenshots and most methods will be simple enough to defeat that I won’t have to resort to that.

So I guess in the long run, the best thing to do is have the most secure front-door as possible, use SSL for transmitting the text, and once the text is displayed on the client PC there’s not much you can do to protect it.