Skip to content

Randomly checksum files to ensure the integrity of backups.

Notifications You must be signed in to change notification settings

wolever/randchk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

randchk will walk two directory structures in parallel, randomly checksumming
files to ensure their integrity.  It is designed to be used in situations where
there may not be time to checksum every file (either during or after a backup),
but some confidence in backup integrity is desired.

For example, I want my laptop backups to run as fast as possible, so I don't
want to use rsync's '--checksum' option... But I also want to make sure my
backups aren't corrupt. To do this, I run randchk after rsync:

    rsync / /Volums/BackupDisk/
    randchk / /Volums/BackupDisk/

This way, even if I don't have time to let randchk check all of my files, it
will still check some of them. Additionally, each time randchk runs, it will
check a different set of files, so two fifteen minute runs of randchk will be
almost* as good as one thirty minute run.

For the curious, here is the basic algorithm:

    files_to_check = list_directory(base_directory)
    while files_to_check:
        file = files_to_check.pop()
        if file is a directory:
            files_to_check = randomize(files_to_check + list_directory(file))
        else:
            check_file(file, file.replace(base_directory, backup_directory))

* actually, not quite: the algorithm is biased towards files high-up in
the hierarchy. The only way to remove this bias, though, is to find all the
files on the system, which would significantly increase the time between
"starting randchk" and "checking files" - something I am trying to avoid.

In future, though, I'll be implementing a smarter walking algorithm, which will
hopefully fix this.

About

Randomly checksum files to ensure the integrity of backups.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages