DontCompressSmallFiles

From RdiffBackupWiki

Jump to: navigation, search

I've seen rdiff-backup gzip compressing files of size 0, resulting in compressed files way larger than 0 bytes. As I see the resulting gzip files contain some header, the full(!) file path (uncompressed) and some tail. As the filename always contain the date and time as well, the resulting files are rather big (compared to the original).

There should be a file size limit under which no compression is applied. A sensible default could be file path length + some constant. Or simply some constant. Maybe disk block size? If the file is smaller than a disk block, nothing to gain (or lose) with compression, but there is an execution time penalty.

-- AndrasGBekes



What version are you using?


Since version 1.1.5: Empty error_log, mirror_metadata, extended_attribute, and access_control_lists files will no longer be gzipped (suggestion by Hans F. Nordhaug).


I also hava lots of .diff.gz files that are much larger than the original .diff file (using version 1.2.6):

-rwxrwxrwx  1 wwwrun www     18 28. Mär 2006  CacheFunction.php.2010-03-08T17:15:03+01:00.diff*
-rwxrwxrwx  1 wwwrun www    164 28. Mär 2006  CacheFunction.php.2010-03-08T17:15:03+01:00.diff.gz*

-rwxrwxrwx  1   1003 users   18 28. Mär 2006  CacheFunction.php.2010-03-14T17:15:03+01:00.diff*
-rwxrwxrwx  1   1003 users  164 28. Mär 2006  CacheFunction.php.2010-03-14T17:15:03+01:00.diff.gz*

-rwxrwxrwx  1 wwwrun www      9 25. Jul 2007  Cache.php.2010-03-08T17:15:03+01:00.diff*
-rwxrwxrwx  1 wwwrun www    147 25. Jul 2007  Cache.php.2010-03-08T17:15:03+01:00.diff.gz*

-rwxrwxrwx  1   1003 users    9 25. Jul 2007  Cache.php.2010-03-14T17:15:03+01:00.diff*
-rwxrwxrwx  1   1003 users  147 25. Jul 2007  Cache.php.2010-03-14T17:15:03+01:00.diff.gz*

-rwxrwxrwx  1 wwwrun www      9 10. Feb 2009  Pager.class.php.2010-03-08T17:15:03+01:00.diff*
-rwxrwxrwx  1 wwwrun www    153 10. Feb 2009  Pager.class.php.2010-03-08T17:15:03+01:00.diff.gz*

-rwxrwxrwx  1   1003 users    9 10. Feb 2009  Pager.class.php.2010-03-14T17:15:03+01:00.diff*
-rwxrwxrwx  1   1003 users  153 10. Feb 2009  Pager.class.php.2010-03-14T17:15:03+01:00.diff.gz*

Best would be, first try to compress and see if the compressed file is really smaller than the original file, and if not, store the original file.

So in many cases the file can be stored directly in the i-node instead of consuming a whole 1k or 4k block, this should quite reduce the footprint of the backups.

Cheers, --Kurmis 20:33, 20 April 2010 (EST)

Personal tools