Need software recommendation - file transfer

I’m volunteering for a non-profit organization based in Chicago that collects community support for global initiatives through the collection of digital signatures and related supporter created profiles & data. We often have to transfer large files, more than several gigabytes, to organizations located throughout the world. The problem is some of these organization are in countries with a weaker IT infrastructure and the large file transfers can take a very very long time or the transfer completely fails due to the unstable network in some countries. Mailing hard drives or using most free transfer products isn’t possible because we need to ensure that we are securing our supporters data, privacy, and expediency. Does anyone know of such a software product that could solve this issue. As we are a non-profit, we focus on spending on our ground game and don’t have extensive funds to commit towards software. Any suggestion or advice would be greatly appreciated.

Using the sneakernet is perfectly viable if you use end-to-end encryption. Look at PGP for your security concerns. How many downloads are you expecting for each upload?

Transfers over bittorrent could also help with the instability problems. Once again, PGP can be used to solve your security concerns.

Of course, efficient compression should be your first step.

This seems a non-general topic that isn’t suitable for this forum, but feel free to PM me.

Rsync

Free, can be used with a vpn or ssl for encryption, and is designed for reliable data transfer, as well as updating large files. Available for a wide variety of operating systems.

There are “sync” programmes that check through for different sections and only transmit the new sections of data. This prevents you compressing the data into a zip file, or something, but the sync program probably has compression anyway.

As si_blakely notes, rsync is one of the default programs that does what you want. Its ubiquity means that you can reasonably safely assume you can run it on any computer you will come across.

It seems as if you are sending around multi-gigabyte files. That implies a single or few files - which is rather at odds with what you say the organisation is doing. OTOH I guess someone may have decided to distribute a single standalone database. This is something that may cause difficulties in the future, if not already. It also makes the life of any program that synchronises differences between files significantly harder.

If, OTOH, it is a matter of sensing around gigabytes of files, or the files are actually archives (ie zip files) of many smaller bits, it will be much easier to manage.

Would be useful to know more precisely what the requirements are.

Consider making a private Bittorrent tracker.

Advantages:

  1. Free and open source, both trackers and clients
  2. As a file get shared with more and more clients, they can all become seeds and help distribute chunks of the fair to each other. (Peer to peer file transfers)
  3. The Bittorrent protocol has built-in integrity checking and is designed to check every downloaded chunk to make sure it was received successfully; if not, it’ll redownload just that chunk. This means that even if you have to leave it on for several days over a pitifully slow connection, eventually it’ll piece all the pieces together into a complete whole. This happens automatically.
  4. Is available for every operating system and platform out there, from computers to phones etc.

Disadvantages:

  1. You will need significant technical expertise to set it up at first (hire a coder or a volunteer college student, tell them it’s for a good cause)
  2. You will have to be aware of security implications (making sure the tracker is private, everything is encrypted, etc.)
  3. Unless at least one of your clients has a good upload speed, the initial spread of a new file can be slow compared to a normal HTTP server. You can mitigate this by adding a HTTP seed on a fast server, but then you’d have to manage that too. This is not the fault of the Bittorrent protocol; it’s just that most web servers have “fat pipes” (a lot of bandwidth) whereas if you’re sharing a Bittorrent file from your decentralized computer, it’s not likely to have a fast connection. Using a regular web server with a good connection as the initial (or permanent) Bittorrent seed will fix that.

Bittorrent is different from “rsync” (which is excellent and amazing in its own right) in that:

  1. Multiple clients can share with each other (one computer shares a file, now two have it and can share to 3 more, and the 5 can share to 10… etc.)
  2. As long as the tracker is active (which doesn’t require much bandwidth or CPU power), the computer(s) actually hosting the file can come on and off as they please. Eventually, over a long enough time, downloads will complete. If a client suddenly loses connectivity for two days, no worries, when it comes back online it’ll ask the tracker for other computers with the chunks it missed and just keep going from there. (rsync will only do that with one source, not multiple like bittorrent)

Similar alternatives:
Resilio Sync is a commercial product (not terribly expensive) built on the Bittorrent platform: https://getsync.com/ and has encryption and peer-to-peer resilient transfers

Syncthing is a similar open source product: https://syncthing.net/
TLDR: rsync for one-to-one transfers. Bittorrent is like rsync for many-to-many transfers. Use your own encryption on top of them and limit access to only people you trust.

I should just add this explicitly because it’s relevant: In case it wasn’t obvious, the mesh network topology of Bittorrent (or similar) has a big advantage in your specific case in that the clients will prefer other clients closer to them (or faster relative to them) vs the further/slower ones.

So if you start sharing a file (the initial seed) from America and one of your locations in Africa downloads it, from them on all other African users can get it from the Africa location and spread it among themselves. Bittorrent gets exponentially more effective the more users/regions are sharing a file.

If mailing a disk (whether a hard disk or a USB flash drive), you can use TrueCrypt to put your files in an encrypted volume on that disk. If anyone were to steal it in transit, they would see one large file that looks like random garbage, with no way to decrypt unless they had the password.

TrueCrypt is free.

TrueCrypt was abandoned and there’s a big message saying IT’S NOT SECURE on its homepage: http://truecrypt.sourceforge.net/

The (unproven) public assumption is that it was infiltrated by state actors:

If you’re just worried about casual hackers, TrueCrypt 7.1 is probably fine (last version before the mysterious warning, publicly audited, although the audit was by a British corporation subject to US-UK surveillance and national security rules):
https://www.grc.com/misc/truecrypt/truecrypt.htm

If you’re worried about states, well, stop what you’re doing now and never touch electricity again. Good luck.

Take a look at VeraCrypt. It is the successor to TrueCrypt and seems to have addressed a lot of the security concerns. https://veracrypt.codeplex.com/

I realise it may not be amenable to change, but why are the files so big? Is there a possible design issue here? (for example, are you distributing a full catalog of data to consumers who will typically only require some small part of it? - that could perhaps be better implemented as a web service where consumers can just get the bit they want)

How about a shared folder using DropBox or one of the equivalents (OneDrive etc)? I’m pretty sure I read about an open source equivalent that you would then control the central server of.

Edit: