Why do web hosting services insist that all files be HTML-linked?

I am in the business of finding, then switching web hosts. Reading thru the terms of service on hostexcellence.com, I came across this:

This is similar to the wording on ixwebhosting.com, where I talked to a person by phone.

My problem is that I sometimes transfer print files of 10-60MB to magazines, commercial printers, etc. and the easiest way I have found to do this is to create a temporary folder on my domain, FTP the file to that location, then email the URL to the recipient. He can then download thru a standard browser, and I erase the file in a few days or weeks. Since no one else should be accessing this file, there is no link to it in any HTML page.

(Why not email the files? Because the email box or size limits are typically too small and after sending large files to the wrong inter-company mailbox a few times, I decided I wanted to upload them only once. Sending a URL again is faster than sending a large file. And FTP access has been too tricky for some of my contacts; it involves permissions, etc., while everyone can easily use a browser.)

But the hosts seem to have a problem with un-linked files. The spokesperson for ixwebhosting.com said that they would prohibit such storage and they had a crawler that would seek out such files and delete them or take disciplinary action up to disabling the account. His reason? “We are in the business of web hosting, not data storage.” Even after explaining this seemingly trivial matter to him, he wouldn’t budge from the standard policy, except to suggest that for every file I wanted to transfer like that, I create a fake HTML file with a link to the other file.

WTF? What difference does it make? If they guarantee X GB of storage, Y GB of bandwidth, why would they care if someone downloads a linked file or a non-linked file as long as the maximums are not exceeded?

I also worry about “orphaned” files. I readily admit when I change HTML pages, I don’t reliably delete old files no longer referenced, especially if they are very small and sometimes, since they may be used seasonally, I don’t want to delete them when I will use them again next season. While I wouldn’t mind a crawler that deletes orphaned files, I certainly don’t want to get in trouble for forgetting about some.

Aren’t there some services that specifically advertise they can be used for backup data storage thru Internet access? Why would web hosts be any different? And what if a file on my domain isn’t linked from my domain, but some other domain?

Web hosting companies are in the business of hosting web pages. They make money from ads they place on your website. If you have files in storage that aren’t referenced on a website, they aren’t making any money off of that storage space.

There are online data storage companies that will allow you to store data without needing to have it accessible on an advertising-furnishable page. They do, however, usually charge. X-Drive is one.

As they’re not insisting the link page is stored on their own server, there’s no practical method by which they would be able to tell the difference between a file that you have not linked and a file that has simply never been downloaded.

It sounds more like they want to restrict downloads to the HTTP protocol, rather than FTP; perhaps because of the way their servers are structured - for example, there might be multiple HTTP servers in a load-balanced array, but only one FTP server, intended for handling incoming files (i.e. uploads).

Actually, I’m not sure about the specific companies that the OP was referring to, but not all web hosting companies make their money from putting ads on your website. Some do, obviously… especially the free ones. Others make their money by charging you from putting your content up on the web.

Only if you choose a cheap service. :smiley:

Perhaps they are afraid you will store files that might get them in trouble. You will note that hostexcellence.com refers to “or as a provisioning service for third party email or FTP hosts.” In essence they don’t want to be a party your email spam program and/or hosting files which may be illegal, i.e., pornography and/or copywrited files.

First, realize that this is their policy and you can’t argue about policy. This is roughly equivalent to going to McDonald’s, buying a cup of coffee, then filling your pockets with as many ketchup packets as you can carry.

“Sir, those are for putting on our hamburgers and fries.”
“But they’re free! Unlimited ketchup! Why shouldn’t I be able to take as much as I want?”

They don’t really care if a file is linked per se, they want it to be part of the web site that they’re hosting for you.

Why? What difference does it make? They don’t have anything to do with my content (except to prohibit porn, etc., but that’s not an issue here).

I realize I won’t change their policy, but I expect there to be a good reason for it and not an arbitrary rule pulled out of a hat, and I’m curious as to that reason.

So if I put a link to porn files that makes it OK? How does a link to a bad file make it a good file? How does adding a link modify the copyright status?

That might be a loophole they’ve overlooked. Or maybe there’s finer print somewhere that insists the link must be internal and I overlooked that.

But all files I will store on the site, whether linked or not, will be uploaded thru FTP. And all files downloaded by whoever will be by HTTP, so what’s the diff?

But hmmm…maybe you’re on to something. Could it be that they assume that any file without a URL on a HTML page somewhere in the world would never be downloaded, and therefore wasting space on their server? But is every URL in the world clicked on from HTML, not typed in or clicked on in an email? Even then, why do they care if I’m not exceeding their maximums?

They’d better not put any ads on my pages, or I walk. All content is to be provided by me or the deal is off. I’m paying for the hosting service, I get to control the content. You may be thinking about free pages, but the services I am investigating, for the traffic & storage I plan to use, are roughly $100/year, my cost.

Ah. I had misunderstood you; I thought you were sending links to download the files via anonymous FTP rather than HTTP.

I suspect this is what the restriction is about; after all, if you’ve emailed someone an HTTP link to your files, then you have complied with the directive “All your downloadable files or files stored on the server have to be available for download via a HTML document stored on the Internet in a publicly or privately accessible area.

Better solution: use http://www.mailbigfile.com. You can upload files up to 1 GB (2 GB if you pay for it). The recipient gets and e-mail, clicks on the link, and downloads the file. The file is removed after a week.

No muss, no fuss, none of your webspace.

For larger files, there’s http://www.dropload.com. They go up to 100 Gigs.

While that would certainly satisfy a strictly legal interpretation, they would have a hard time verifying it using only a crawler, and again, why do they care??

If they want to prohibit storage for backup’s sake (again, why do they care?) I could write a program to link to every one of the files in a backup directory from a fake HTML page (as per their instructions) and fool the crawler, so that seems like a useless prohibition. I would think unlinked files might generate LESS traffic on average, not more!

Good alternate solutions, and thanks for the suggestions. But why should I have to use another service and learn another procedure WHEN I’M PAYING FOR THE STORAGE AND BANDWIDTH already and it is more than adequate?

These webhosts often oversell their available resources, knowing that not everyone is going to use up the maximum. Kinda like airline seats.

They advertise unlimited bandwidth and storage but they can only afford that if, say, 5% to 10% of their clients use up a lot of storage and bandwidth. Most people don’t use that much or the company would go bankrupt. By forbidding the space to be used as storage, they’re discouraging people from consuming more resources. Even if it’s not strictly enforced, it gives them a way to control their abusive customers. If you’re uploading an occasional 10 MB file, neither they nor their crawlers are likely to care. But if you upload daily 100 GB backups of your hard drive, you can bet your ass they’re going to find you and politely inform you of that little section in your contract.

This is most common with webhosts who offer “unlimited” storage or bandwidth. Go with one that offers you a set number and they’re less likely to care about what you use it for, since you’ll pay exorbitant fees once you go over that limit.

But I have to say, the terms more commonly say something along the lines of “you can’t use an unreasonable amount of space” or “you can’t use so much that it affects our other customers” – with them being the sole judge of what is “reasonable”, of course. It IS a bit strange to use HTML links as the criterium, but I imagine the underlying reasoning is the same.

Do what I do (or perhaps you do this already and I didn’t understand your procedure clearly).

Do everything you’ve done before. In addition, create a temporary HTML page that links to the file for download. So, instead of sending your clients the direct URL to the file, send them the link to the HTML page for download. It would take a few extra seconds, but would meet all the requirements of your hosting company.

That’s one more step than necessary. Once I create the temp HTML page, they don’t require that it be used, just posted. Then I can send the link of the actual file URL to the end party and ignore the HTML file.

Still doesn’t answer the question of “why?”, especially since this exact technique was suggested by their tech as a workaround. Unless he didn’t know why either.

But I’m paying for the freakin’ 1GB storage. Apparently I get 1GB storage, but only if all files are linked to an HTML file. If all files are not linked, they cancel my account. What freakin’ difference does it make – 1GB is 1GB either way?

If I am paying for 100GB storage and 100GB daily bandwidth, shouldn’t I be able to use it? What little section in the contract? The one that says I can’t upload 100GB to my 100GB paid-for storage area?

I’m not subscribing to “unlimited” bandwidth, since I could never fill that up. :slight_smile:

The question here is not <amount of storage> or <bandwidth> or even <content>, but whether a link exists to each stored file somewhere in a HTML file.

You mean if I contract for and pay for 100GB, it is unreasonable to actually use it?

No, like I said, it’s less of an issue when you’re paying for a specific amount of space. They’re probably not going to care. It’s the abusive “unlimited” customers they’re afraid of… from their TOS, bolding mine:

“All your downloadable files or files stored on the server have to be available for download via a HTML document stored on the Internet in a publicly or privately accessible area. This policy only applies to web sites who are considered to be abusive in service, disk space or resource consumption and where it is evident that the “fair-use” of resources among customers has been breached, particularly in regards to disk space utilization. Web sites that are found to contain either/or no html documents, a large number of unlinked files are subject to warning, suspension or cancellation at the sole discretion of IX Web Hosting.”

Because the other service is designed to accomplish what you need to accomplish; the web page is a workaround (and is more complicated).

You want to use a hammer to drive in a screw. While it’s possible, it’s still not the right tool for the job.

As for their policy, they’re probably protecting themselves against people putting illegal programs and files on their servers. Without a link, the file can remain hidden more easily, and it could possibly expose them to liability if they don’t have a policy of removing unlinked files.

With a nod to Reply’s last reply, here’s the best explanation I can come up with.[ol][]Only the unlimited storage accounts are affected by the policy of “no unlinked files”, not because they really care about what kind of files you store, but because it is thought that unlinked files are more likely to be a backup dump of a non-web server and therefore humongous and are therefore defined as excessive since they can’t really offer unlimited storage since that would violate the known laws of the universe so they have to draw the line somewhere.[]The policy is written poorly and doesn’t sufficiently differentuate between accounts that have paid for a specific space and bandwidth and those that are “unlimited”.The tech I spoke to didn’t make or understand the distinction, either.[/ol]

Nonsense. I’ve been doing it this way for 10 years and it works perfectly. You’re telling me that downloading a file from a web site is the wrong way to download a file from a website? Isn’t that what all browsers do?

FTP a file to a URL, email the URL, recipient clicks on the link and downloads thru a browser, I delete the file when I get around to it or when the server logs show it has been downloaded. What could be simplier?

So if I don’t want them to discover an illegal file I have uploaded, all I have to do is to link to it? I don’t think so. Ever heard of a DIR or LS command? Exposes all files, linked or not, and even easier than using a crawler to look for file names in HTML code. And if the workaround is to make a list of fake links, “bad” files will be ignored by being linked. If you want to examine or determine the content of a file, it doesn’t matter if it is linked or not, and linking won’t make it more visible to someone or some program who has root access to the server directory.

I think the best explanation is my previous post.