Must preserve all file metadata, eg. hard/symlinks, permissions.
Easily automated without having to set up public-key authentication for root.
Bonus:
Point in time snapshotting, so that I can easily retrieve the state of the file system from any given time.
Client-side compression to reduce disk-space usage of backups.
Right now I’m running rsync manually, because the point about automation and public-key authentication is important to me; PK auth is fine by itself, but I don’t want to allow it for the root user, because the backup user should not have write permissions to anything.
This last point is is the test that all the tools I have looked at – rdiff-backup, rsnapshot, etc. – fail, because they all rely on the SSH infrastructure for authentication.
Somebody suggested to me that I use the BSD tool dump to somehow mount the file system for read access by a specific user. To me that sounds too much like a possibly insecure kludge.
In order to backup everything you’re either going to need to run something as root or setup a user with read access to everything. (Not a good idea.)
But Unix backups are not typically done by duplicating the entire filesystem. Rather, partitions and directories where non-recoverable data live (e.g. configuration files, and all the stuff that’s not installed by the OS or package manager.) are backed up. This can be done by a non-privelaged user with a public key, or with write access to an snfs mount.
So how is a user with read access to everything worse than running as root, who has read and write access to everything?
I’m not mirroring entire file systems; I’m mirroring home directories, logs, configuration files, databases, etc.
How does a non-privileged user get access to those files? It can’t be the member of all the groups on the box; and even if it were, there’s no guarantee that the files have the group read bit set, since many of the files, being the product of users and programs on the server, are beyond my control.
Are you suggesting that there should be two backup processes here, one that (running as root) first harvests all the data into a special backup directory, then one that (running as some dedicated backup user) reads it, such as from a remote location?
While such a system would work, I don’t have the disk space to dedicate to it, nor do I want to waste the extra CPU cycles or the I/O incurred by all the copying.
So doing a dump of the filesystems and then using scp to move the archive off-host isn’t good enough?
If I understand correctly, you are looking for a way to do all of this as a user other than root. You need role-based access control, or the equivalent.
In Solaris, it’s RBAC. You define a role (like a user, but unable to log in from the console or network. Someone must log in and assume the role) and assign to that role only the priveleges, permissions and commands the role needs to perform a function. In this way, you can give a normally unpriviliged user root priviliges for a limited set of tasks. It may sound like sudo, but it’s more secure and configurable.
In Linux, there is an effort called selinux. Red Hat distributes it with their installation media, but you have to find it and turn it on. I believe it’s also it’s own distribution under some name. Anyway, selinux has role based access control.
If you can’t/won’t use RBAC, then it may be possible to configure sudo for your needs.
Or, do a dump of the file system and off-host the archive using scp to a known repository. By setting up the keys between your clients and only the repository, you can have an automated, trusted back-up system.