Backup retention script

Posted by Roy de Boer on 5 October 2009

Tag(s): Consultancy, Artikel, Open Source

Rsync (in combination with hardlinked snapshots) is a very nice way to backup linux servers. I’m not going to explain how to script things, there are already enough sites doing this, see for example this site. Because corruption and accidental file removes can occur at any time, also 1 second before the file is written to the backup, it is important to keep more than one backup. Furthermore, it can take a while before corruption is noticed: days, weeks or even months.

It is, of course, also important to delete old backups. The most simple option is to rotate the backup directories. E.g. storing the daily backups to backup.0, backup.1, etc. Each day the last backup is removed, all backup directories shift one position up, and the day backup is written to .0. This has the disadvantage of having (in the real world) only backups of the past few weeks. Of course, there is also the naive approach: simply remove all backups each week, yes I have seen this. IT is terrible, sometimes.

A common approach for tapebackups is to store the last K day tapes, the last L week tapes and the last M month tapes. Note that for example the monday tape becomes a week tape, and the first week-of-the-month tape becomes a month tape.

Let’s implement this retention scheme for a rsync/snapshot solution. Assume a directory with one directory per backup, each backup is named after the creation date, format YYYY-MM-DD.

Step one is a script that reads all backup dates and filters out the directories it should keep, lets call it retention.rb:
#!/usr/bin/ruby

require ‘rubygems’
require ‘active_support’
require ‘date’

keep_list = []

day_retention = 14
week_retention = 8
month_retention = 12

day_retention.times { |i|
keep_list.push Time.now.at_midnight.advance(:days => - i)
}

week_retention.times { |i|
keep_list.push Time.now.at_beginning_of_week.advance(:weeks => - i)
}

month_retention.times { |i|
keep_list.push Time.now.at_beginning_of_month.advance(:months => -i).advance(:days => 6).beginning_of_week
}

# glob does not accept regular expressions, it’s closer to a shell glob
all_files = Dir.glob(’[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]’)

keep_list.each { |d|
all_files.delete(d.strftime(’%Y-%m-%d’))
}

puts all_files.sort
Preview which directories are being removed:

$ cd /backups
$ retention.rb

Finally we can get rid of the old backups with:

$ cd /backups
$ retention.rb | xargs -d'\n' rm -rf

Please note that all directories matching ‘[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]’ will be deleted. If this is undesired behavior you should test whether the entries in all_files are valid dates, or read it in from stdin and create the list yourself.

You might have to run something like
$ sudo aptitude install ruby rubygems
$ sudo gem install activesupport

to install activesupport, the ruby extension that lets you type cool things like 45.months.

P.S. feel free to post links to similar scripts in the comments.

Post a comment:

Name:
E-mail*:
Comment:
*optional, will not be published.