SubdirHasChanged

From RdiffBackupWiki

Jump to: navigation, search

purpose

rdiff-backup is great for daily backups, but messes a bit with diskspace if nothing has changed. You may use monthly backups for such tasks instead, at the cost of a longer period without an up-to-date backup. This little helper script will try to figure out in advance if the (daily) backup needs to be run by checking for changed files. The algorithm is very simple, as it basically just hashes the output of a recursive directory listing. This will be fine for most cases where changed files have a changed size or modification date. Just test it before relying on it ;-)

usage

I use this script within my regular backup-script (bash, too) that is triggered by cron. If $BACKUP_PATH hasn't changed since the last run, it exits and avoids another useless backup.

#!/bin/bash
# regular backup-script
BACKUP_PATH="/etc"  # what to backup

# check if backup is actually needed 
/usr/local/bin/subdir_has_changed.sh "${BACKUP_PATH}" `cat /var/spool/dirhash.etc` >/var/spool/dirhash.etc
if [ $? == 0 ]; then 
    exit
fi

# do backup
rdiff-backup...

As you can see, I'm storing the hash in /var/spool/dirhash.etc. If that file does not exist (e.g. at first run) it will be created, but will act like nothing has changed (since no hash was available)


subdir_has_changed.sh

#!/bin/bash   
# 2009-03-09 <pille+rdiff-backup-wiki+post@struction.de>
#
# display and compare directory index hash
# to detect changed directory contents
#
# DEPEND ls sha1sum

DIR="$1"
HASH="$2"

if [ "${DIR}" == "" ]; then
    echo "$0 <dir> [hash]"
    echo
    echo "dir           - directory to check for change"
    echo "hash          - sha1 hash to compare dir with (this may came from last run)"
    echo
    echo "OUTPUT:       sha1 hash for directory index of dir"
    echo "RETURN CODE:  0, if hash is omitted"
    echo "              0, if hash matches the computed directory index hash for dir"
    echo "              1, if hash doesn't match the computed directory index hash for dir"
    echo "              2 on error"
    echo
    echo "NOTE:         since simply the directory listing is hashed, you should always use absolute pathnames for dir and must not mix absolute and relative pathnames!"
    echo "              if dir doesn't exist, the hash will still be computed using the error output!"
    echo "              if dir is a symlink, you and you are really interested in its content, you should add a trailing slash!"
    exit 2
fi

CURRENT_HASH=$(ls --almost-all --recursive --full-time "${DIR}" 2>&1 |sha1sum)
CURRENT_HASH=${CURRENT_HASH%% *}

#echo "$(date) ${DIR} ${HASH}" >>/tmp/debug.log                                                       
#echo $CURRENT_HASH >>/tmp/debug.log                                                 
#echo >>/tmp/debug.log                                                               

echo "${CURRENT_HASH}"

if [ "${HASH}" != "" ]; then
    if [ "${HASH}" != "${CURRENT_HASH}" ]; then
        exit 1
    fi
fi
exit 0
Personal tools