Implementing backups for NeCTAR virtuals
This page is a work in progress. Currently, I am just documenting "the options" that are available for backing up your NeCTAR virtuals. In the future, I will add specific instructions / recipes for setting up some of the recommended approaches.
Note: In the following, I am am talking about "backup" in the broad sense. Different ways of implementing backups give you different properties in terms of:
- whether or not you can easily find and recover individual files, or particular versions of individual files,
- frequency of backups,
- how long backups can reasonably be kept.
By contrast, RAIDing or replication schemes implemented in the NeCTAR infrastructure protect against loss-of-media or loss-of-access disasters, but they give you no guarantee (and little chance) that you can go back to a previous version of a file or volume. This is probably no help if a rogue script or careless user deletes files and you need to get them back.
Out of the box
A NeCTAR virtual has no formal backup arrangements.
- Ephemeral disc storage is not backed or replicated up at all. This is a matter of policy.
- Persistent volumes and Object store may be replicated, but they are not advertised as such.
- The VM images and snapshots may be replicated or backed up, but they are not advertised as such.
Beyond that, some types of NeCTAR storage are implemented on RAID storage; see above.
Reference: NeCTAR Support Wiki: ResearchCloudStorage
If you need specific information on exactly what is guaranteed, you need to ask specific questions of the operators of the Nodes that you rely on. The details are Node dependent, and may change.
Issues that are relevant to disaster recovery planning (for your NeCTAR instance and / or your data) include:
- whether a particular storage type is replicated or backed up,
- whether the replica / backup is online or offline (e.g. magnetic tape),
- where the replica / backup is held,
- when the replica / backup is going to be created, and
- the frequency with which the replicas / backups are refreshed (if that is applicable).
Implementing your own backups
Backing up to where ...
I have had some success in doing coarse-grained backups of an application's database and volatile files on "volatile" disc to both NeCTAR Object Store and a mounted RDSI collection. You could also use a NeCTAR Persistent Volume. Of course, neither of these are offline, and the location (data centre, etc) for these backups is not specified.
The other option is to send your backups "off" the NeCTAR / RDSI cloud infrastructure; e.g.
- store them on a university / departmental / institutional server,
- store them on AARNET's CloudStor+ service, or
- store them on Amazon S3 or Glacier, or using some other commercial cloud storage provider.
Backup using OpenStack Snapshots
OpenStack provides two mechanisms for creating snapshots that can be used as crude backups. However, there are important caveats(!!) and other drawbacks to this approach.
- An Instance snapshot gives you a copy of the state of the running (or halted) instance. This includes the state of the primary ephemeral disc, but the snapshot excludes the second ephemeral disc and any Persistent Volume storage that you may have attached, or RDSI collections that you may have mounted.
- A Volume snapshot gives you a copy of the state of a Persistent Volume.
- Live snapshots should work in theory. However, if the system is active while the snapshotting is in progress, the resulting snapshot may be inconsistent. Reference: http://docs.openstack.org/trunk/openstack-ops/content/snapshots.html. Possible mitigations include:
- using a system-specific "file system freeze" utility to block anything that attempts to update the filesystem,
- running 'sync' before starting the backup, or
- pausing or shutting down the system.
- My understanding is that there is no way to snapshot a NeCTAR secondary ephemeral disc.
- Snapshots consume significant quota'd NeCTAR resources. There are quotas on both the amount of disc space you can use, and the number of snapshots (or each type) that you can keep.
- Snapshots are not incremental, and they are not compressed. In the case of a typical Instance snapshot, you are saving a (mostly) redundant copy of all of the files from the original image.
- To extract individual files from an Instance Snapshot "backup", you would most likely need to create and boot a new Instance (currently) with the same flavour as your original Instance.
- Image snapshots currently entail synchronously sending the data up to the primary image store at the Melbourne node. Given current network bandwidth constraints, this can take a significant time.
- It is not clear that you could get a running NeCTAR Instance to initiate a snapshot of itself. (We have not tried this.) If that doesn't work, then you would need to set up some separate infrastructure to initiate your backups; e.g. cron jobs on your Linux workstation.
Backup using Tar
The simplest way to perform backups is to use the Linux "tar" command to create a compressed archive file of a directory tree, and then copy the archive file to a "safe place". Here are a couple of simple examples:
Create a local backup of your home directory, putting it into a local directory. (The name of the backup archive contains today's date.)
export BACKUP=`date +%F`
tar czf /backups/$BACKUP.tar.gz /wwwdata
Piping the backup to a file on a remote Linux system (using SSH):
export BACKUP=`date +%F`
tar czf - /wwwdata | ssh firstname.lastname@example.org "cat > /backups/$BACKUP.tar.gz"
Change "email@example.com" as appropriate. (This will require ssh credentials for secure password-less connection to the remote host / account.)
Backing up to Swift. This assumes that you have already 1) installed the swift client, 2) created a script to set up the environment variables with credentials, and 3) created the Swift container (e.g. "mybackups"). The first two steps are described here, and the third step can be done using the NeCTAR Dashboard's "Object Store Containers" panel.
export BACKUP=`date +%F`
tar czf /backups/$BACKUP.tar.gz /wwwdata
swiftupload mybackups /backups/$BACKUP.tar.gz
You should be able to see the objects created by the 'swift upload' via the NeCTAR Dashboard.
These simple scripts could be called from a crontab entry, though you also need to deal with the issues of:
- managing the backup copies that you are creating, and
- some kind of reporting to warn you when your backups have stopped running.
The main problem with simple Tar is that it is non-incremental; i.e. it creates a copy of the entire tree rather than just the stuff that has changed. This tends to take longer, and use more disc space. There are workarounds, but they make the approach more complicated. For example, GNU tar has the capability to do incremental backups, but the standard (GNU) scripts for doing this are typically not installed on recent Linux distros.
Backup using Rsync
Rsync is a Linux / Unix utility that "synchronizes" a local directory tree with an equivalent tree on a (typically) remote system. This is done by figuring out what files and directories have been added, removed or changed, and using "push" or "pull" to make the two copies the same.
Rsync is typically run in a mode that uses file timestamps and/or checksums to figure out what has changed, and then only push the changes. The process is generally fast over a local network, but slower as the network latency increases.
The main drawbacks of this are:
- the remote copy of the tree needs to be online, and
- maintaining backup snapshots with multiple timepoints gets complicated.
There is also the issue that if something goes wrong in the middle of the "rsync" step, you will be left in a state where your "backup" is a hybrid of the "old" and "new" states.
When you use "tar" or "rsync" from a crontab entry to implement your backups, you need to deal with a number of low-level management issues yourself; e.g. creating the backup schedule, managing the rotation of backup "volumes", figuring out where a file you want to restore mightbe. This section describes some more sophisticated systems that take care of these issues, and more.
Backing up using Duplicity
Duplicity is a relatively new backup utility for Linux systems that supports full and incremental backup of a directory tree. It supports full and incremental backups and provides simple subcommands for the important tasks. (Unlike amanda, in that it requires almost no configuration, apart from setting environment variables to configure the "back end".) The other attractive thing about duplicity is that it supports lots of ways of saving backups including scp / sftp, ftp, ftps, rsync, webdav, Amazon S3, Google Cloud Storage, Google Docs, Dropbox and ... Swift / Openstack Object Storage.
Most of the information about duplicity is in the manual entry.
The one complication is that duplicity Swift support was only added in version 0.6.22. For Ubuntu 13.10 and earlier, installing the required version of duplicity entails some fiddling around with repository configurations to add a PPA from launchpad.net. This should be resolved when "Trusty Tahr" (14.04) is released. (Fortunately, the Fedora 19 package repos have duplicity-0.6.22, and so do the CentOS 6.5 repos.).
Backing up using Linux dump / restore
Linux includes dump and restore utilities that provide whole file-system dumps to a (notional) tape, and support full and incremental modes. Backups can be written to a physical tape, a simple file, or to a server that implements the Linux remote tape (rmt) protocol. The corresponding restore utility can restore an entire file system or selectively restore individual files It should be possible to use dump / restore for NeCTAR virtuals, though I have not tried it.
The drawbacks of dump / restore are that it doesn't do dumps at a finer grain than a file system and that you have to manage the dump volumes (files or tapes) yourself.
Backing up using BackupPC
BackupPC is a web-based system for managing backups of a group of PCs or Linux systems. It can use SMB, tar over ssh/rsh/nfs, or rsync to pull data from the machine being backed up, and it stores the backed up files incrementally.
The big drawback with BackupPC is that you need another system (server) on which you can run the BackupPC web service. It also requires that the backups are kept online on the backup server.
Backing up using Amanda
Amanda is a suite of utilities specifically designed for Unix / Linux backups, where the backups are written to tapes (or simulated tapes). It supports full and incremental backups and can manage the rotation of your backup volumes. Amanda has an S3 driver that allows it to backup directly to Amazon S3 or Swift.
The main drawbacks of Amanda are that the operational model and the configuration files are complicated, and the rough edges (e.g. bugs) in the Amanda S3 driver when you use it for Swift.