In January, my iBook died; a week ago, my friend Iqram's MacBook died after an OS X Security Update; two days ago my friend Dave's iBook hard drive failed. Iqram and I managed to recover our data; Dave did not. At the moment, I don't have the greatest faith in Apple products, so I decided to backup all my data.

I used to use an external USB hard drive for backups, but my 160 GB external has been acting up of late and sometimes needs to be restarted 2 or 3 times before my computer recognizes it, so about 2 months ago I started playing with Amazon S3 and JungleDisk.

Amazon S3 (Simple Storage Service) is basically an infinite hard drive you can by on a pay per usage basis, and JungleDisk is a utility that allows you to mount S3 as a hard drive on any OS. JungleDisk has a backup tool built in, but right now the tool does not provide mirroring--ie, if you backup a folder on your computer and then delete some stuff locally, that stuff will not be deleted remotely. So if you start moving stuff around, you'll end up with duplicate copies of your data. Redundant data annoys me, so I decided to finally learn how to use the rsync command line utility to mirror folders. Here's how I setup a backup of my home directory using Amazon S3, JungleDisk, and rsync.

Get Amazon S3

Signup for Amazon Simple Storage Service. Here's how the pricing works, according to Amazon's website:

  • Pay only for what you use. There is no minimum fee, and no start-up cost.
  • $0.15 per GB-Month of storage used.
  • $0.20 per GB of data transferred.

Download JungleDisk

JungleDisk is free and available for Linux, Windows, and OS X.

Setup JungleDisk

JungleDisk is pretty easy to setup on OS X. You just download the JungleDisk dmg, open it, and drap the .app file to your Applications Folder. Then you open it and enter your Amazon access keys, which you can find by going to this page on Amazon. You will also need to choose a bucket name that you want JungleDisk to use on your S3 account. I used kortina.

<p>When you configure JungleDisk, you can choose to have it auto-mount as a Volume on your computer whenever you start the JungleDisk app. (this is the default setting.) If you need to mount JungleDisk manually, open Finder, hit ⌘ k, and enter http://localhost:2667/ as the host name. You can find more instructions on configuring JungleDisk on this page on their official site.</p> <h4>Run rsync to Mirror Your Home Directory to Jungle Disk</h4> <p> rsync \

-avvz –size-only –delete \

–exclude .svn –exclude .Trash –exclude Library/Caches –exclude “*.log” \

/Users/kortina /Volumes/JungleDisk </code> </p> <h4>How this rsync Script Works</h4> <p> The rsync command takes a bunch of options and then a source directory and destination directory as arguments. In this case the source is /Users/kortina, my home directory, and the destination is the mounted JungleDisk drive at /Volumes/JungleDisk. Since there is no trailing / after /Volumes/JungleDisk, a directory named kortina will be created on the JungleDisk drive. To copy the contents of your home directory directly to JungleDisk without enclosing them in a folder with your username, simply add a trailing slash: /Volumes/JungleDisk/. </p> <p> Here’s what the options I’ve used do: </p>

  • -a: This runs rsync in archive mode, which is equivalent to running it with -rlptgoD. The main thing important here is -r, which will recurse into directories, and in fact many of these other options bundled in -a are irrelevant because of the way S3 treats file meta data.
  • -v or -vv: Run in verbose or very verbose mode. Verbose mode will print the name of each file copied or deleted, and very verbose mode will additionally print the names of files that are skipped. I like to run in -vv because I can see progress more easily.
  • -z: this option will compress file data and make things a bit faster.
  • --size-only: usually, rsync will compare the size and last modified date of each file to determine whether it is out of date and needs to be copied. Because of the way S3 handles file metadata, however, the last-modified-date of each file you upload to S3 will always be the date it was uploaded. This will screw up rsync, so you need to use the --size-only option to make this backup script work.
  • --delete: (delete files that don't exist on sender) this is another important part of my backup script. The reason I didn't want to use JungleDisk's built in backup utility was because it didn't support mirroring. --delete is the option that makes rsync to a mirror instead of a simple copy.
  • --exclude: (exclude files matching PATTERN) this option allows you to ignore file patterns like *.log or directory names like Movies.

Once you setup S3 and Jungledisk, if you want to keep the same options I use, just copy and paste the code below, subsitute your_username for kortina, open a terminal and run the command.

rsync -avvz --size-only --delete --exclude .svn --exclude .Trash --exclude Library/Caches --exclude "*.log" /Users/kortina /Volumes/JungleDisk