FilenamesConversion
From RdiffBackupWiki
Right now rdiff-backup doesn't care about encoding of filenames. It just takes file and transfers it to the backup. This can create problems if the target system uses different encoding then the source system (windows -> linux). After backup file names are stored with original encoding so they are unreadable on target system. If you restore them to the original system they look fine again.
It would be useful if rdiff-backup could handle the situation by converting filenames from encoding to encoding exactly as convmv does. It could look like another option:
rdiff-backup --convert-file-names cp1250 UTF-8 /from/somewhere /to/somewhere
I am not sure about implementation. I guess if there tool (convmv) exists it shouldn't be hard to use it. Other option would be to adapt the code and use it. I am not skilled enough to do either, sorry.
--Gregy 09:29, 19 January 2009 (EST)
This could get really complicated, as there is actually no "encoding" associated with filenames in Unix. A filename is just a sequence of bytes, excluding ascii zero (0x00) and slash (0x2f).
In real life, most filenames were entered by humans, using some interface that did impose some encoding, and thus end up being a valid string in the encoding used by that person. A program like convmv can be used, knowing that a certain subfolder (or even the whole tree) was created of valid byte strings in some encoding, and to convert that to some other encoding. Mostly used for migrating single-byte encodings to utf8 and vice-versa (but cannot handle e.g. cp1250 to cp1252).
However a good backup program must be able to handle the case where one filename is utf8, other filenames are in cp1252 and cp1250, etc., and even filenames containing e.g. newline chars or other "weird bytes", not even convertible to some encoding, all without loosing information.
-- P.Bijnens
The case with one system encoding being cp1250 and other one cp1252 is IMHO quite exotic and can not be handled correctly (the system will show wrong characters). There is simply no support from e.g. cp1250 for cp862 characters.
However such support exists if the target system uses unicode, rdiff-backup is just not using it.
--Gregy 08:09, 22 January 2009 (EST)
Having tried rdiff-backup from windows (cp-something) to linux (utf8) with non-ascii characters in filenames, I can understand the frustration. The backup works, but is not really a mirror. My case (and many others?) could be solved without full charset conversion, but by simply supporting uft8 filenames on windows. See http://boodebr.org/main/python/all-about-python-and-unicode#PLAT_WIN for details.
--Anders 08:13, 8 February 2009 (EST)
