MD5_VERIFY

This is a Linux Program

PURPOSE   OPERATION   ITS ABOUT TIME   OPTIONS   COMMAND LINES   RELATED PROGRAMS Processing Stats


Author: Dan Mares, dmares @ maresware . com (you will be asked for e-mail address confirmation) or dan.mares @ norcrossgroup . com
Portions Copyright © 2007-2021 by Mares and Company, LLC
Phone: (678-427-3275

PURPOSE

The MD5_VERIFY program is a linux version of the Maresware sha_verify program. This program is a stripped down version of the sha_verify and performs only one of the operations (merge option) that sha_verify performs. This operation is equivalent to the sha_verify -m merge option, which in effect calculates final md5 value of all the files in a sequenced set of filenames. (file.000, file.001, file.002 etc).

MD5_VERIFY is designed to calculate the MD5 hash or 160 bit SHA1 total of the "combined" group of files with the same base name. The base name MUST have a traditional dot (.XXX) extension. This group of files with the same base name is a set of output files usually created while using a dd type imaging program to create images of disk files.

Often, when using dd or a similar imageing program such as the Maresware ntimage a group or set of output files are created with sequential index numbers as their extensions. For instance, if the output file names were image. The extensions would be, .000, .001, .002, etc. until there were enough files created to encompass the entire physical disk that was imaged.

Why MD5_VERIFY?

Because the stage of the process that occurs between the dcfldd calculation of the md5 (in memory) and the actual write to the output file is not included in any md5 calculation. If a write error in some way corrupted the data being written to the output file, you wouldn't know. I personally have had this occur. The only way to determine if there was a write error to the output files, is to perform an MD5 on the final set. The Linux program md5sum can do this, but MD5VERIFY produces a more usable output.

When you perform a dd of a physical disk, even if you use dcfldd (of which there is an enhanced version (check the help screen for the hashwindow, and logfile options) on the Mares and Company ftp site) you are calculating the md5 hash of the original disk on the fly. A simple process might be:

md5sum   /dev/hdX > someoutputfile
dcfldd   if=/dev/hdX hashwindow=2000M hashlogfile=logfile .... | split -b 2000m -a3 -d - output.

this will produce an md5 of the original disk (the md5sum line), and an md5 hash listing of the dcfldd program (hashlogfile) as it processes the data through memory. BUT how do you know that the output.xxx files all total to the correct hash? You could then take the output.xxx files and perform the following test:

cat output.* | md5sum

This would produce a single value abcdef111.... of the final md5. If it matches, you have a warm fuzzy feeling. What if it doesn't match? You have to go and perform the entire image again.

What if you had a way of knowing which file in the set was the one that contained the write error? If you did, then you could use dcfldd, dd, or ntimage to reimage just that part of the disk. In the least, you would know which one of the set had an error in it and make appropriate adjustments.

With md5_verify you will get an output that contains, the md5 value of each file in the set, and more importantly, the total combined md5 of the entire data set. This value is the same as would be reported by the md5sum program. But you now have the individual md5 values of each file. So the one which contained the error can be easily identified.

The output can be set to contain delimeters for additional processing or loading into a spreadsheet.


OPERATION

Verfy simply, the program takes from user the name of the set of files to perform the md5 on. It then identifies the proper sequence (it is not always true that the OS will default sort correctly), of the extensions, .000, .001 etc, and calculates the md5 of not only the individual files, but the md5 of the combined files as if they were one contiguous stream of data. Which is the way the data came off of the hard drive.

After the calculations are made, the filename, and md5 of each file is printed to the screen, or placed in an output file, and at the end, the final md5 is also provided.

If the final md5 matches the output of dcfldd, or the original md5sum of the physical drive, you know you have a good data set. If the final value is incorrect, you can proceed to correct the problem.


Program Output:

The output record is normally a fixed length record which can easily be imported into other programs, a spreadsheet, or visually compared with the hashlogfile produced by dcfldd.

header shown here for information only. the actual output file has no headers.
/path/filename.000  size literal: hash value

/path/filename.000    173  MD5:   2DA1B0C315D7D92B42DD3F13B82D5704
/path/filename.001    173  MD5:   2DA1B0C315D7D92B42DD3F13B82D5704
/path/filename.002    173  MD5:   2DA1B0C315D7D92B42DD3F13B82D5704
/path/filename.003    173  MD5:   2DA1B0C315D7D92B42DD3F13B82D5704
  total bytes         692  MD5:   ABCDEFABCDEF01234567890123456789

or delimeted with pipes

/path/filename.000|    173|  MD5:|   2DA1B0C315D7D92B42DD3F13B82D5704
/path/filename.001|    173|  MD5:|   2DA1B0C315D7D92B42DD3F13B82D5704
/path/filename.002|    173|  MD5:|   2DA1B0C315D7D92B42DD3F13B82D5704
/path/filename.003|    173|  MD5:|   2DA1B0C315D7D92B42DD3F13B82D5704
  total bytes     |    692|  MD5:|   ABCDEFABCDEF01234567890123456789

In addition, the times it took to process each file is included.

The output of the program is intended to be placed in an output file for future reference such as verification that files were not altered. This is important when certifying that file contents were not altered during forensic examination or duplication for analysis. These hash values can also be used when making copies of the images to say a work array/drive. Don't try and tell anyone that every time you copy a file, it copies correctly.


Top

OPTIONS

Usage: MD5_VERIFY filename.\* -[options]

YOU MUST ESCAPE THE WILDCARD IN THE FILENAME

Only 1 filename is needed for default operation.
This is the simplest format.

All options should be preceded by a (-) minus sign. Some can be grouped together.

-p + path(s):  Use this to point the program to the directory that the files are located in. This is only necessary if you are not running from the default directory of the files. Which is the preferred location. (-p /mnt/hdc1/images)

-f + filespec:  If the -p is used, then you need to include the filename of the files to merge. YOU MUST ESCAPE THE WILDCARD IN THE FILENAME. (it is a shell requirement). (-f   outputimagefilename.\* )

-oO + filename:  Output file name. Place the output to a filename. If uppercase ‘O’ then existing output is appended to.

-a: append output to filename provided in -o option. Serves same purpose as using an upper case O

-l:  print the hash values in lowercase. default is to print the hash values in UPPERCASE hex notation.

-s:  produce the 160 bit SHA output instead of the 128 bit MD5 MD5_VERIFY.

-d + “delimeter”:  replace “delimeter” with a delimeter (typically a pipe ‘ |’ ) within double quotes with which to delimet fields. If the delimeter is not printable, use its decimal ascii value but don’t place it it quotes. (-d “|”)


Top

COMMAND LINES

c:>MD5_VERIFY filename.\* -o output
MUST escape the wildcard in linux. don't ask why, just do it.

c:>MD5_VERIFY -f filename.\* -o output
same as above

c:>MD5_VERIFY -f filename.\* -o output -s
add the sha1 value to the output line.

c:>MD5_VERIFY -f filename.\* -o output -l
print hex values in lower case.

c:>MD5_VERIFY -p /mnt/hdc1/images_dir -f filename.\* -o output
point it to a specific location where the images reside.


RELATED PROGRAMS

For a windows (W2K, XP environment)

SHA_VERIFY

MD5

HASH or the linux version hashl