DontCompressSmallFiles
From RdiffBackupWiki
I've seen rdiff-backup gzip compressing files of size 0, resulting in compressed files way larger than 0 bytes. As I see the resulting gzip files contain some header, the full(!) file path (uncompressed) and some tail. As the filename always contain the date and time as well, the resulting files are rather big (compared to the original).
There should be a file size limit under which no compression is applied. A sensible default could be file path length + some constant. Or simply some constant. Maybe disk block size? If the file is smaller than a disk block, nothing to gain (or lose) with compression, but there is an execution time penalty.
-- AndrasGBekes
What version are you using?
Since version 1.1.5:
Empty error_log, mirror_metadata, extended_attribute, and
access_control_lists files will no longer be gzipped (suggestion by
Hans F. Nordhaug).
I also hava lots of .diff.gz files that are much larger than the original .diff file (using version 1.2.6):
-rwxrwxrwx 1 wwwrun www 18 28. Mär 2006 CacheFunction.php.2010-03-08T17:15:03+01:00.diff* -rwxrwxrwx 1 wwwrun www 164 28. Mär 2006 CacheFunction.php.2010-03-08T17:15:03+01:00.diff.gz* -rwxrwxrwx 1 1003 users 18 28. Mär 2006 CacheFunction.php.2010-03-14T17:15:03+01:00.diff* -rwxrwxrwx 1 1003 users 164 28. Mär 2006 CacheFunction.php.2010-03-14T17:15:03+01:00.diff.gz* -rwxrwxrwx 1 wwwrun www 9 25. Jul 2007 Cache.php.2010-03-08T17:15:03+01:00.diff* -rwxrwxrwx 1 wwwrun www 147 25. Jul 2007 Cache.php.2010-03-08T17:15:03+01:00.diff.gz* -rwxrwxrwx 1 1003 users 9 25. Jul 2007 Cache.php.2010-03-14T17:15:03+01:00.diff* -rwxrwxrwx 1 1003 users 147 25. Jul 2007 Cache.php.2010-03-14T17:15:03+01:00.diff.gz* -rwxrwxrwx 1 wwwrun www 9 10. Feb 2009 Pager.class.php.2010-03-08T17:15:03+01:00.diff* -rwxrwxrwx 1 wwwrun www 153 10. Feb 2009 Pager.class.php.2010-03-08T17:15:03+01:00.diff.gz* -rwxrwxrwx 1 1003 users 9 10. Feb 2009 Pager.class.php.2010-03-14T17:15:03+01:00.diff* -rwxrwxrwx 1 1003 users 153 10. Feb 2009 Pager.class.php.2010-03-14T17:15:03+01:00.diff.gz*
Best would be, first try to compress and see if the compressed file is really smaller than the original file, and if not, store the original file.
So in many cases the file can be stored directly in the i-node instead of consuming a whole 1k or 4k block, this should quite reduce the footprint of the backups.
Cheers, --Kurmis 20:33, 20 April 2010 (EST)
