Catalog

PURPOSE   OPERATION   OPTIONS   COMMAND LINES   RELATED PROGRAMS


Author: Dan Mares, dmares @ maresware . com (you will be asked for e-mail address confirmation)
Portions Copyright © 1998-2020 by Dan Mares and Mares and Company, LLC
Phone: 678-427-3275


top

Purpose

The Catalog program is designed to be run on UNIX and LINUX systems. Currently it is only compiled for Linux i386. However, a port to other platforms should be rather quick and easy.

It uses the same algorithm to search directories as the LINUX hashl and strsrch programs.

It is designed to traverse the entire file system and produce a list of all files on the system. (thereby making a "catalog" of the system.)

You will note that the ls or find program will also accomplish these tasks. However, the output of ls or find is not fixed, and is not conducive to importing into a data base for further analysis. Catalog is a fast, quick and simple to use program that will provide output in a standard fixed format that is easily manipulated by databases or other programs. It is easy to learn, and you won’t be seeing differences from UNIX to UNIX manufacturer.

The output of Catalog is a fixed length record, which is easily processed by data manipulation programs(ex.,databases).

By default it will provide a listing of all files. This can be customized by use of the -f (file) option to selectively identify only files meeting specific file name criteria. (ex., .*.c).


top

Operation

The user provides Catalog with appropriate options on the command line. Catalog can run from a script file which means that for forensic purposes it can run unattended.

The program should be run while the user has root privilege. This is so that the program will be able to search directories that are restricted to normal users.

The user must supply, as a minimum search criterion, a file type or path. Options are available for modifying how the program identifies files.

Catalog can search for specific file types (exs., *.c, *.doc), or search down selected paths. More than one file type, and more than one path can be used at once. The file types and paths provided by the user on the command line are used to build a matrix which Catalog uses to identify the files.

After Catalog has determined that it has enough information, it proceeds to find and list all the files fitting the criteria. It then prints the output to the screen. Alternately, if an output file was requested, it writes to the output file. Catalog does NOT write to the hard disk unless specificially requested to do so by the user. (i.e., you asked for an output file).

The output is a fixed length record that can be imported into a database for reference and cross matching with an output which is generated later. Depending on the type of output chosen, the length of the record changes. But within any one run of the program, the records will be the same size.

In tests cataloging a basic LINUX box on a 166 MHZ 586 it took about 3 minutes to find and list over 25000 files.

Since this program only lists the files, there is no alteration of the files' last access date which may be important in some instances.


top

Output

The output of the program is intended to be placed in an output file for future reference.

Here is a sample of the default output to a file. The column headings are shown just for legibillity and do not actually show up in the output file.

NAME     |SIZE |UID| MODE |Access time      |CREATE time      |WRITE time

u_base.c 11205  500 100644 06/15/1998 07:19a 06/15/1998 11:53c 06/15/1998 11:53w

The items in the output file are:

1: NAME: the complete filename, including path. (Path is not shown here to conserve space.)
2: File size:
3: UID, User ID of the owner of the file. This can be crossmatched with the password file to find out who this is.
4: Mode: The file permissions. Notice there is more information than you are normally used to seeing. This is because there is actually more information in the file mode field than is usually indicated by the ls -l command.
5: Three file date/times: Access; Create; and Modify(w) time.

The default name/path is roughly 80 characters long, making the overall record length about 150 characters long. The -w option allows you to alter the width of the path/name field for larger or smaller output records.

The various file modes are listed here as hex masks. Notice that the low order 3 characters reflect the normal file permission locations, and the others are more specific file types.

      st_mode   100644 (this mode indicates a regular file)

      S_IFSOCK 0140000 socket
      S_IFLNK  0120000 symbolic link
      S_IFREG  0100000 regular file
      S_IFBLK  0060000 block device
      S_IFDIR  0040000 directory
      S_IFCHR  0020000 character device
      S_IFIFO  0010000 fifo

      S_ISUID  0004000 set UID bit
      S_ISGID  0002000 set GID bit
      S_ISVTX  0001000 sticky bit

      S_IRWXU  0000700 user (file owner) has read, write, execute
      S_IRUSR  0000400 user has read permission
      S_IWUSR  0000200 user has write permission
      S_IXUSR  0000100 user has execute permission

      S_IRWXG  0000070 group  has  read, write and execute permission
      S_IRGRP  0000040 group has read permission
      S_IWGRP  0000020 group has write permission
      S_IXGRP  0000010 group has execute permission

      S_IRWXO  0000007 others have read, write and  execute permission
      S_IROTH  0000004 others have read permission
      S_IWOTH  0000002 others have write permisson
      S_IXOTH  0000001 others have execute permission

top

Options

Usage:
catalog    path/filetype    -[options]/P>

At least 1 initial file or path is recommended but not necessary.

For additional paths or filetypes use -p and/or -f options. If only a file name used, current default path is used, and recursed from there.

If more than one path is required to be checked, the -p option is the only way to do it.

-d + delimiter    Insert a delimiter between fields. The delimiter should be a single character. If it is not a printable ASCII character, you can enter a numeric ascii value of the character. If you want to enter the pipe symbol ( | ) you might have to enclose it in quotes or enter the decimal value 124. ( -d 124, or -d “|” ) (NOTE: Windows help does not display the surrounding quotes, but they are there.)

-f + filetype    Additional filetypes to search for separated by spaces. This option allows for more than one file type to be searched for during the same run. (ex., \*.c \*.bat ). If only a path is provided on the command line, then file type defaults to \*.

Note: Because most users have global wildcard expansion set in their shell, if you want to search for wildcard type filenames, you must do one of the following when entering wildcards on the command line:

Either escape the wildcard with a backslash, ( \*) or quote the entire file type ( “*.c” ). Otherwise the shell will attempt to expand the wildcard, and you may not find all you are looking for. No -f option will result in all files (*) being identified.

-g + #
-l + #    
Thats an (ell) not a (one). Linux only. Use these options to locate files only (g)reater than or (l)ess than a specific size in bytes. Replace the # with a value. Currently a max value of 4 gig file size is allowed.

-G + #
-L + # 
    Thats an (ELL) not a one). Linux only. Use these options to locate files greater than or equal to, or less than or equal to # days old. Replace the # with a value. Currently the calculation is done on a 24 hour calculations based on current system time. So if current system time is 1300 hours, and a file was made yesterday at 1200 hours, it would be 2 days old rather than 1 day old. This is because yesterday at 1300 hours would be 24 hours prior to current time, and yesterday at 1200 hours would be 25 hours prior; this equates to 2 days.

The calculations are defaulted to the time listed by the ls command (which is modification time). To get another listing, use the -t options.

-N     Print ONLY the full path and filename to the output record. This is an ideal option for obtaining only a list of files. No other information is printed.

-n     In the output file, print only filename in the record and not the entire path. The other information, (exs., hash, date, time, etc.) is also printed. This is different from the -N in that -N only prints filename and path; it gives no other information.

-o + output    Filename for output. Output can be redirected using > output. If redirection (>) is used, then this option is unnecessary.

-p + path(s)    Additional paths to search. Can include multiple paths separated by spaces. (ex., -p /work /bin /etc). This option allows for searching more than one path at a time.

-r    DO NOT recurse through path provided, default is to recurse. Use this option to do a single directory.

-v    Silent run. NO VERBOSE. Do not print normal column headings above numbers. This provides cleaner screen output for redirection to a file. This can also be accomplished by settting an environment variable called silent to ON. (set SILENT=ON). The SILENT environment variable is used by Crckit also.

-t[acm]     Print time as [a]ccess time, [c]hange time, [m]odification time. Linux uses a funny way of representing the ‘c’hange and ‘m’odification times. The ‘c’hange is listed as status change times. (ls -cl) The ‘m’odification should show last write time. The modification is the default time listed by ls -l. And ls -ul gets the last ‘u’pdate or access time.

-z     Display time using (ZULU) GMT time format. This is useful for keeping file times consistent. (Be certain that CMOS, TZ, and time zone settings are correct).

-w + #     Limit filename length to # characters. If the full path is being used, the default is 80 characters of path+filename. If the -n option is used, the default filename printed is 15 characters (without the path). If the filename including path is more than 80 characters, the path is truncated at the front. (Notice that the -W and -w upper and lowercase will produce slightly different outputs. Experiment with those to find what suits your purposes best).


top

Command Lines

$ catalog / -o outputfilename
Do catalog of files for entire drive.

$ catalog /work
Do catalog of files in /work.

$ catalog /work -r
Do /work path without recursion.

$ catalog /work/\*.c
Do /work path with for all *.c files (add -r for no recursion).

$ catalog /work -n
Do /work printing only 15 characters of filename.

$ catalog /work -w 30
Do /work printing 30 characters of filename.

Since this documentation was cloned from another program running under DOS, you may occasionally find a reference that looks like a DOS path or drive name. However, you must be certain, when using Catalog, to use the correct slash on the Linux system to delineate paths.


Related Programs

Diskcat

Hash

Mdir

top