Skip to content

Beluki/MultiHash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

About

MultiHash is a small Python 3 program that can calculate file digests, like those generated by the coreutils tools md5sum, sha1sum, etc...

The main selling point is that it reads all the input files once, calculating all the requested algorithms in one go. For example, the following command:

$ MultiHash.py md5 sha1 -i *.iso -o md5sums sha1sums

Is equivalent to:

$ md5sum *.iso > md5sums
$ sha1sum *.iso > sha1sums

Installation and usage

To install, just make sure you are using Python 3.3+. MultiHash is a single Python script with no dependencies that you can put in your PATH.

Using it is pretty simple. One algorithm, one file:

$ MultiHash.py md5 -i debian-7.1.0-i386-DVD-1.iso
6986e23fc4b8b7ffdb37a82da7446e8a *debian-7.1.0-i386-DVD-1.iso

Multiple algorithms, one file:

$ MultiHash.py md5 sha1 -i debian-7.1.0-i386-DVD-1.iso
6986e23fc4b8b7ffdb37a82da7446e8a *debian-7.1.0-i386-DVD-1.iso
cea26c7764188426da8c96bdf40eff138eb26fdc *debian-7.1.0-i386-DVD-1.iso

Multiple algorithms, multiple files:

$ MultiHash.py md5 sha1 -i *.iso -o md5sums sha1sums

$ cat md5sums
6986e23fc4b8b7ffdb37a82da7446e8a *debian-7.1.0-i386-DVD-1.iso
8a1bf570e05ac4f378c24a4bcd6c7085 *debian-7.1.0-i386-DVD-2.iso
6ee99fe1f80e1c197cd35c404448e6af *debian-7.1.0-i386-DVD-3.iso
f84fe104755ae19c76c5d7ef09eff06d *debian-7.1.0-i386-DVD-4.iso
c8a99e4474f259e42093d1219eba0cf3 *debian-7.1.0-i386-DVD-5.iso

$ cat sha1sums
cea26c7764188426da8c96bdf40eff138eb26fdc *debian-7.1.0-i386-DVD-1.iso
60d918b8f5fded013dc5f53ad0d6e9510a5cb2ee *debian-7.1.0-i386-DVD-2.iso
0cfe71a98e48140be53e3a5023ad0dd112ac45aa *debian-7.1.0-i386-DVD-3.iso
f6bef688c7e21c9d89bd601f7d382ac84531a8bf *debian-7.1.0-i386-DVD-4.iso
b3112b29d6430c77b8653d8b615d2699bff20fa3 *debian-7.1.0-i386-DVD-5.iso

Command-line options

MultiHash has some options that can be used to change the behavior:

  • -i file [file ...] specifies input files to checksum. If no files are specified or if the filename is - stdin will be used.

  • -o file [file ...] specifies output files where the results will be written. There must be the same number of output files as algorithms. If no output files are specified, stdout will be used.

  • --newline [dos, mac, unix, system] changes the newline format. I tend to use Unix newlines everywhere, even on Windows. The default is system, which uses the current platform newline format.

  • --threads n runs n threads in parallel. Threads consume input files one by one, calculating all the algorithms for each file. Regardless of which thread completes first, results will be printed in the same order specified as input. The default is to use a single thread. Use --threads auto to start as many threads as CPUs available.

Portability

Information and error messages are written to stdout and stderr respectively, using the current platform newline format and encoding.

The actual output (the digests) is always written in UTF-8, both when writing to stdout or to files. When using the same --newline format, output should be byte by byte identical between platforms.

The output is compatible to that of the coreutils tools and can be checked with them (e.g. md5sum -c). MultiHash always reads input in binary mode, prepending an asterisk to the filename (like md5sum) on output. It makes no sense to read input as text and md5sum defaulting to text has been a source of problems (e.g. on Cygwin and Windows) in the past.

The exit status is 0 on success and 1 on errors. After an error, MultiHash skips the current file and proceeds with the next one instead of aborting. It can be interrupted with Control + C.

MultiHash is tested on Windows 7 and 8 and on Debian (both x86 and x86-64) using Python 3.3+. Older versions are not supported.

Performance

The performance of MultiHash depends on many factors:

  • Whether the operation is IO-bound (slow hard disks, single algorithm) or CPU-bound (RAID or SSD, multiple or more complex algorithms).

  • Whether there is fadvise support.

  • Performance of the IO-scheduler when running multiple threads. In particular, Windows is known to dramatically degrade performance when multiple threads read multiple files at the same time.

  • Whether the input files are currently cached. Unlikely on big ISOs.

In the worst case scenario, the performance is comparable to that of the coreutils tools plus Python's startup time.

In the best cases, specially when calculating multiple algorithms like md5sums, sha1sums, sha256sums and sha512sums for a ton of isos (like in Debian releases) MultiHash can be many times faster. Speedups of 4x or more are common.

Your mileage may vary.

Status

This program is finished!

MultiHash is feature-complete and has no known bugs. Unless issues are reported I plan no further development on it other than maintenance.

License

Like all my hobby projects, this is Free Software. See the Documentation folder for more information. No warranty though.

About

Calculate multiple checksum digests reading each input file once.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages