Can you tell if an online file has been downloaded?

Relating to the Minnesota Senate Election & Recount, but this is a specific technical question – no political sniping, please. (There’s a Pit site for that.)

As I understand it, the Coleman campaign, back in January, had a spreadsheet file online at its’ website, containing the names, addresses, contribution amounts, credit card numbers, and credit card security code of donors to the campaign. This was online with no security on it, except that the web address was not published – you had to know (or guess) the URL name where it was located. (Sometimes called security by obscurity.)

Apparently it was not that hard to guess, because at least one person, a consultant, noticed this, and posted an online article about this vulnerability. She says she noted the contents, but did not download the file. The next day, a local reporter heard about this, downloaded the file to verify the contents, and then notified the Coleman campaign about it. Then the file was taken offline within a day or two after that.

Then nothing happened for 2 months, until an online security reporting site reported this leak, and posted the contributions file online (with parts of the credit card numbers removed) to prove the leak. Now there is a big fuss about it.

The Coleman lawyer said “We immediately contacted the feds and the state. They did a forensic examination of the server. They had a ‘virtual certainty’ that there had been no download of the data and that none had been taken.”

This is my technical question.
How can ‘the Feds’ be certain that no one out there in the Internet downloaded this contribution file?
Are there records kept on the server of every time someone downloads a web page from that server? Is that automatic, or does it have to be turned on for specific files (and was it turned on for this file, which appears to have been online ‘accidentally’)? And would such log records still be available 2 months later?

It just sounds to me like ‘the Feds’ reassuring us that ‘everything is under control’. At least 2 sources have reported that they DID download it. The local reporter says he did so before notifying the Coleman campaign. And the WikiLeaks website has published a partially-removed version of the file that they apparently downloaded.

Depends on the server software & the configuration.

As an example, the default settings for a Windows web server (IIS) will log 100% of the contents it serves. Every page, every icon, every script, and yes, every download.

IIRC, by default these logs are kept indefinitely, 1 file per day per website.

It’s certainly not uncommon for IT staff to set up something which automatically purges them after, say, 90 days to avoid filling up disks.

Without knowing the details of where the website is/was hosted, what software was used, what organization with what skills was responsible for maintaining it, there’s no way to be sure.

But if the site was maintained by folks as clueless as the folks who put that file up unprotected, it’s certainly possible the logs were intact for the Feds to examine. Not that they couldn’t have been tampered with to remove evidence of the downloads before the feds got there, but clueless IT folks woudn’t / couldn’t do that successfully either.

The usual types of web servers (Apache, IIS, etc.) have logging turned on by default, i.e. every retrieval of a page using HTTP is logged.

What my web server (which has a pretty typical setup) logs is:

[ul]
[li]IP address of the downloading computer (i.e. the user’s computer)[/li][li]date/time[/li][li]the URL of the page[/li][li]number of bytes downloaded[/li][li]HTTP code returned (200 for OK, 404 for file not found, etc).[/li][li]URL of the referring page (unless turned off at the user’s browser)[/li][li]user-agent name (i.e. the name by which the web browser identifies itself).[/li][/ul]

That’s for HTTP, but the usual FTP servers (if the file was available through FTP rather than HTTP) will usually also be configured to log every download.

In this case, the Feds can be certainly lying since there is already proof that the file was downloaded because it is available. The server logs every page served up and to which IP so there would be a record of who downloaded the file and when.

The original consultant reported it as a “database.tar.gz file”.

I would think that would be a FTP download rather than just HTTP, right?

Five or ten years ago, I would say so. But either protocol can be used for any file type, and enough people have forgotten about FTP that most things are just HTTP nowadays.

On another slang, if you are using PHP to serve the download, you can probably track it.

In general, only the IP address or host name of the destination machine is recorded in the HTTP/FTP logs. The user ID is not recorded, and isn’t even available to the server, unless it was supplied explicitly from the other end — and for a public download that needed no authentication, it wouldn’t be supplied.

That IP address in the log can be a proxy server (an “anonymizer”), or a dynamically assigned address, so it isn’t necessarily very helpful in pinning down the exact person who downloaded the file. It might tell you that a Comcast or Verizon subscriber downloaded it, but that would be the end of it — until you subpoena the company for their records, I suppose.