category-group: math
layer: 02
header file: z_hash.h

synopsis.
The "hash class" documented here is actually a family of classes that provide hashes on strings. Files are not involved as hashes operate on blocks of text. The string objects can be put into binary mode, for manipulating binary data such as object modules or executable files. A simple, common interface provides a uniform protocol for computing all hashes.

CRCs (Cyclic Redundancy Checks) are included under the hash umbrella as they are a specialization of hashing. the CRC_o class provides a simplified way to compute CRC values.

Cryptographic hashes are included: MD5 and a group of SHA-xxx variants (SHA-256, SHA-384, and SHA-512) are currently implemented.

The results of a the hash (whether 32-bit CRC check, or multi-byte MD5 or SHA) can be cast to a string object, like so:

  byte_t ans[128];
  size_t nb = 128;
  CRC_o c;
  c.go ("Frankly, my dear, I don't give a damn.");
  c.result(ans, nb);
  string_o sval = c;
In this example, the results are also stuffed into the byte array (which is an array of unsigned characters - 8-bit boxes) 'ans', via the member function 'result()'. The order of these bytes is that the least significant value is in the first slot; that is, the low-order octet is in 'ans[0]'. This order is contrary to the way many values are presented. For instancem, MD5 is stored in a 16-character buffer, and the first character is the greatest valued octet. Consider this SHA-1 value:
888f9117d1c94e5c08d3954ccd542e53d1e1f0ec
If this hash value is stored as a string (an array of char[]), then the first char value - the 0th element - is '8'. This is the high-order value of this number. If this value is the output of result(), and put into a large byte_t array 'ans' (as in the previous example), then ans[0] will have the hex value 0xEC in it (decimal 236).

You can apply the exact same code to any subclass of hash_o. This includes MD5, SHA-512, or any of the hash subtypes:

  byte_t ans[128];
  size_t nb = 128;
  MD5_o x;
  x.go ("Frankly, my dear, I don't give a damn");
  x.result(ans, nb);
  string_o sval = c;
In the case of MD5, the result() member function will fill up 16 bytes of the ans[] array, whereas the 32-bit hashes (including all CRC implementations) will use up 4 bytes (32/8 = 4 bytes). The result() member function provides a way to access the result of the hash. You need to pre-allocate an array of byte_t to accomodate the hash results. result() is provided the array to fill in, along with the size of the array in the second parameter ('nb', in the example above). When result() returns, the second parameter ('nb') is set to the actual number of bytes used by the hash digest (in terms of bytes). In case you need to know how many bytes to allocate, each subclass has a member function bufsize() to give you a hint.

For the classes that bundle multiple implementations or versions of the hash, you can initialize the object by providing the type name when the object is instantiated, or you can reconfigure the object via set_name(). The type name is not case-sensitive, and for 2-part names such as CRC with the UCB version, you can use a variety of single character separators [ - , ; . / ], although they are all converted to a forward-slash ('/'), which is the preferred character.
For example:

  CRC_o c0("CRC/ucb");
  CRC_o c1("crc-UCB");
  CRC_o c2("Crc:Ucb");
  SHA_o sh0("SHA.256");
  SHA_o sh1("sha/384");
This member function set_name() is not available to all subclasses, only those where it makes sense - where the class has multiple implementations:
  byte_t ans[800];
  string_o s("Yo mama so stupid she tripped over a cordless phone");
  size_t nb = 800;
  SHA_o sha("SHA-256");
  sha.go (s, nb);               // calculate SHA-256
  sha.result(ans, nb);
  string_o sval = sha;
  sha.set_name("SHA-512");      // reconfigure SHA to -512 mode
  sha.go (s, nb = 800);         // calculate SHA-512 now
You cannot do a set_name() call to the murmerhash_o class to convert it to anything else. There is only 1 murmer hash implementation in the Z Directory - you can "convert" a hash type to its original type, which is a no-op:
    int ec;
    murmerhash_o mur;
    ec = mur.set_name("murmer");        // success: does nothing
    if (ec < 0) exit(1);

For the CRC class, there are 2 implementations - the default, and one from UCB (UC Berkeley). For SHA, the default is SHA-256. Here is a list of hash classes and their variations:

class type name(s) notes
hash_o "none" This is the master base class. It is an ADT, so cannot be instantiated.
murmerhash_o "murmer" a simple hash algorithm returning a 32-bit value.
CRC_o "CRC" this defaults to a basic [no-name] CRC implementation.
CRC_o "CRC/UCB" This is public code from UC Berkeley. A salt value of 0xFFFFFFFF is used.
MD5hash_o "md5" The algorithm by Ron Rivest was implemented by Colin Plumb (1993).
SHA_o "SHA", "SHA-256" the default SHA version is SHA-256. The SHA-256 key is 64 characters long.
SHA_o "SHA-384" the SHA_o object can be set to SHA-384 by simply setting the object name to "sha-384". The key is 96 characters long.
SHA_o "SHA-512" you can set up SHA-512 by simply providing this type name ('SHA_o x("sha-512");'). SHA-512 is considered one of the most secure crypto hashes. It generates an extremely long key.

Table of Hash Classes

The list of member functions for hash_o are almost all the same for all the sub-classes. Functions specific to a sub-class (there are a few) are presented at the end of this list. All the subclasses (SHA_o, CRC_o, ..) contain all the standard "orthodox" operations - default and copy constructors, assignment operator, destructor; init() and reset(), etc.

member functions (primary)

hash_o()
SIGNATURE: hash_o ()
SYNOPSIS:
creates a a new hash object. Each class recieves its appropriate default name. See the table below for a list of names of hash classes.
 

hash_o(hash_o)
SIGNATURE: hash_o (const hash_o &rhs)
SYNOPSIS: creates a a new hash object, which is a copy (clone) of "rhs". this function invokes the copy assignment operator.
 

operator = (hash_o)
SIGNATURE: hash_o (const hash_o &rhs)
SYNOPSIS: copies the values of "rhs" into the current hash object. this function invokes copy() to do the copying.
 

destructor
SIGNATURE: ~hash_o (const hash_o &rhs)
SYNOPSIS: "destroys" the current object. In all hash classes, this does very little. Consider it a a no-op.
 

const string_o &name() const()
SIGNATURE: name
SYNOPSIS:
returns the full type name of the class. This name consists of 2 parts: the "main type" (top-level) and "sub-type" (variation on the main type). It may be just the top-level name, or it can be a concatenation of the top-level name and variation. If there is a variation, it is separated form the top-level name by a slash ('/'). See the discussion in the general description, on this page, for more information.
TRAITS: this function is inline
 

went()
SIGNATURE: boolean went() const
SYNOPSIS:
if this is set, it indicates that the object [successfully] did a hash calculation. This fact is noted in the internal boolean flag 'is_figured', which this function returns its value. The flag is set by member function set_name(), which can be accessed only by sub-classes (it's a protected function).
This function should be used in each sub-class'es go() member function, to indicate that a calculation was completed.
RETURNS:
TRUE: go() was invoked and there is a hash value available
FALSE: otherwise
TRAITS: this function is inline
 

get_typeinfo()
SIGNATURE: int get_typeinfo (string_o &top, string_o &sub, int *pi)
SYNOPSIS:
This very utility-oriented function is intended mainly for internal use. It gets the object's type name, which is the top-level type, and an optional sub-type name, if there is one. If there is no sub-type, the output string variable 'sub' is an empty string.`
This calls the other get_typeinfo() member function with the 'my_name' as its first parameter.
PARAMETERS

  • top: [output] the top-level class type name
  • sub: [output] any sub-type (ie variation - subspecies) name. If there is no subspecies, this string is empty ("").
  • pi: error indicator output variable. A value of 0 indicates that
 

get_typeinfo()
SIGNATURE: int get_typeinfo (const string_o &s, string_o &top, string_o &sub, int *pi)
SYNOPSIS:
this function parses 's' into 2 strings. The format of 's' is expected to be [WORD][CHAR_SEP][WORD], where WORD can consist of letters, digits, and underscores ('_'); CHAR_SEP is a single character that separates the WORD components, and must be one of: [ - / . , ; : ]. The case of WORD is irrelevant, but the final string stored into the object is converted to upper-case. see the "Table of Hash Classes" (on this page) for a list of valid type names.
 

[assignment to string_o]()
SIGNATURE: operator string_o () const
SYNOPSIS:
this function provides the hash result as a string, in hex. Note that the order is highest values are the first elements in the string's array - exactly as one would display a number visually. That is, given the value "150,487", the digit '1' is the first element of the character array of this number as a string, but its highest value digit; whereas the digit '7' is at the end of the string, but is the lowest-valued digit. This is in contrast to the byte ordering done by member function result(), where the lowest value is in the first slot of the output array.
TRAITS: this function is virtual
 

result()
SIGNATURE: int result (byte_t buf[], size_t &nb, int *pi) const
SYNOPSIS:
transforms the calculated hash value is converted to an array of bytes (unsigned characters) and stored into the output array 'buf'.
PARAMETERS

  • buf: [output] an array where the hash digest is stored. This must be allocated by the application. If it is null, calling this function will fail. If it is too short, there may be a buffer overrun leaduing to a fatal program crash.
  • nb: [input & output] the size of the buf[] array. It is up to the application to ensure that this value is accurate and if there is insufficient space (ie, if the value of 'nb' is greater than the size of 'buf[]'), there will be a stack overrun, which will probably lead to a program crash.
  • pi: [output] error indicator output variable. values: 0: success
    zErr_OperationNotStarted: no hash value has been calculated (go() not called earlier)
    zErr_Param_NullPointer: 'buf' is NULL
    zErr_Resource_TooSmall: the size of 'buf', as indicated by the value of 'nb', is too small to accomodate the hash value
TRAITS: this is a pure virtual function. All sub-classes must implement this function.
 

bufsize()
SIGNATURE: size_t bufsize (int *pi = NULL) const
SYNOPSIS:
returns the size of the hash digest, in terms of byte_t units. This function is intended to be used in conjunction with the member function result(), if the application needs to know in advance how large a buffer to pre-allocate:

  MD5_o x;
  size_t nb = x.bufsize();
  byte_t *dynbyt = new byte_t[nb];
  x.go ("[Some text to compute a hash on]");
  x.result(dynbyt, nb);         // 'nb' value here is vital
A more typical way is to use a local array variable with a large, static number of elements:
  SHA_o x;
  byte_t digest[65536];
  size_t nb = 65536;
  x.go ("[Some text to compute a hash on]");
  x.result(digest, nb);         // 'nb' is the size of digest

TRAITS: this is a pure virtual function
 

go()
SIGNATURE: int go (const string_o &s, size_t nb = 0, int *pi = NULL)
SYNOPSIS:
this calculates the hash value of the given string. If you want to calculate a hash value on a file, you need to stuff it into a string object first. The hash value can be a CRC or crypto such as MD5 or SHA-2. This function is the most important of the hash class cluster.
PARAMETERS

  • s: [input] the string to compute the hash on
  • nb: the number of bytes to process. if it's 0, the string length ('s.size()) will be used.
  • pi: [output] error indicator output variable. values: 0: hash successfully calculated
TRAITS: this is a pure virtual function
 

calculate()
SIGNATURE: int calculate (const string_o &s, size_t nb = 0, int *pi = NULL)
SYNOPSIS:
this is an alias for go(). This function calls that function. It is exactly identical; see go() for more information.
TRAITS: this function is inline
 

reset()
SIGNATURE: int reset ()
SYNOPSIS: resets the hash object. Any existing data is wiped out. this does not change the object's type or sub-type.
 

copy()
SIGNATURE: int copy (const hash_o *rhs, int *pi = NULL)
SYNOPSIS: copies the contents of 'rhs' into the current object. The object types must match.
PARAMETERS

  • rhs: [input] the object to copy.
  • pi: [output] error indicator output variable. values: 0: successfully copied
TRAITS: this function is virtual
 

set_name()
SIGNATURE: int set_name (const string_o &s, int *pi)
SYNOPSIS: convert the object type to that specified by 's'.
PARAMETERS

  • s: [output] the top-level class type name
  • pi: [output] error indicator output variable. values:

  • 0: [sub-]type successfully changed

  • zErr_SameData: the type indicated by 's' is the same as the
  • current type. 1 should be returned in this case

  • zErr_TypeIncorrect: the main (top-level) type, as found in 's',
  • is different from the current type. The object cannot morph
  • into another type (C++ rule).
RETURNS:
0: successful change
1: no change
-1: error
TRAITS: this function is protected. It can be accessed only by sub-classes
 

examples.
The following sliver of code shows the 2 ways to get the message digest value (casting to a string_o, and using result() member function).

    #include "z_hash.h"

    int i, j, ie0, ie1;
    size_t nb;
    byte_t buf[1024];
    char chex[8];
    string_o s = "to be, or not to be";
    MD5hash_o x;
    ie0 = x.go (s, 0, &ie1);
    x.result (buf, nb = 1024, &ie1);    // method 1: get byte array
    s = x;                              // method 2: get string

    std::cout << "MD5 - string op. [hex] = " << s << "\n";
    for (i=0; i < nb; i++)
    {
        j = buf[i];
        z_int_to_str(j, chex, 16, 8);
        std::cout << "buf[" << i << "] = " << j << " (0x" << chex << ")\n";
    }