class ref name: textstring
category-group: strings
layer: 2
header file: z_txtstring.h
libraries: libz00.lib libz01.lib libz02.lib

synopsis.
the textstring object is an extension and subclass of the string_o class. It provides many operations for multi-line strings and operations that apply to text within a single line, including the capability to pull out delimited or fixed-with fields, words or quoted passages. These words can be retained from the original object or discarded. It also provides convenient number-to-text and text-to-number conversion.

description.
This class object primarily extracts words, lines, and a specified number of characters from its text, in addition to the base class functions. It can also change the case of its contents. A word is defined along the guidelines for a word-token in the C programming language: any group of characters where the first character is a letter, and all subsequent characters that are either a letter (any case), a number, or '_' (underscore). The definition of a bigword comes from that of the "vi" editor", as when doing a 'yank-word' operation ("yW"): a word, and any non-whitespace characters that follow it, and any whitespace after that. Thus, given this text:
Hark! Hurry yonder, before ..Expletive(/@!*) - Deleted gets here!!
If the current pointer position is at the first character, "Hark!" would be the result of a yank_bigword() call. If the cursor is on the position 14, a bigword yank yields "onder, ". If only whitespace follows, the whitespece is included. Thus at position 62 ("Deleted"), the result of yank_bigword() is "Deleted ". At position 42 (starting at "Expletive.."), the result is "Expletive(/@!*) ".

The "eat_[xxx]()" member functions simply extract the next item from the text that the object contains. The "yank_[xxx]()" series of functions retrieve the corresponding item(s). These functions have an [optional] boolean argument. If this boolean variable is TRUE, the internal data is removed from the text block. In the case of yank_nchars(), the number of characters to yank must be specified, so the boolean argument becomes the 2nd parameter. In the case of yank_line(), there is a second boolean input parameter. If it is TRUE, the returned textstring object is trimmed; that is, any trailing whitespace (blanks, newlines) is removed.

In regards to yank_field(), you can pull successive text fields from a line. The "fields" are delimited by a specific character. By default, this is a comma (''). You can set the field delimiter with set_field_delim(). This function has some special case concerns when the field is a quote. Consider a line with comma-separated fields like so:
"Any way, today",45.00,99,more text,foo,bar
In order to treat "Any way, today" as a single field, the object must be pre-configured to do so by calling handle_quotes(). Otherwise, yank_field() will return ""Any way" as the text for the first field. Furthermore, there is a special provision for processing excel-style formatted fields. This makes it easy to handle comma-separated-value files. Consider a line like this:
"Brick machine, ""champion"" (Martin)",98,December 23,1865
With the following code:

textstring_o s("Brick machine, \"\"champion\"\" (Martin)\",98,D");
textstring_o chunk, field;
chunk.handle_quotes();
field = chunk.yank_field();
'field' will contain:
Brick machine,
instead of
Brick machine, "champion" (Martin)
which is the desired contents for extracting excel-based text cells. In order to get the desired results, the textstring object must also be configured with setstyle_excel() (the object can be unconfigured by providing an optional FALSE argument, ie setstyle_excel(FALSE)). In this case, both setstyle_excel() and handle_quotes() must be explicitly invoked:
textstring_o s("Brick machine, \"\"champion\"\" (Martin)\",98,D");
textstring_o chunk, field;
chunk.handle_quotes();
chunk.setstyle_excel();
field = chunk.yank_field();

numbers and the textstring object. This string class is intended to convert from integer numbers to string or vice-versa as easily as possible. You can check that the contents represents a number, or assign an integral numeric value directly to an instance:
flag_o fla = 0;
textstring_o s1, s2;
s1 = 73490;                             // "73490" gets stored
if (!s1.is_numeric()) return;           // check: contents is a number?
s1 += "; a good salary";                // "73490; a good salary"
s2 = s1.yank_number(fla, "[COMMA]");    // s2 == "73,490"

member functions (primary)

textstring_o()
SIGNATURE: textstring_o ()
SYNOPSIS: creates a a new textstring object, completely devoid of contents. internal data pointer will be null.
 

textstring_o(textstring_o)
SIGNATURE: textstring_o (const textstring_o &rhs)
SYNOPSIS:
creates a a new textstring object, based on the contents of textstring object "rhs". The contents of string "rhs" will be copied (deep copy) - that is, the data will be cloned.
 

operator = (textstring_o)
SIGNATURE: const textstring_o &operator = (const textstring_o &rhs)
SYNOPSIS:
an existing string object's contents is replaced with that of "rhs". the current object may have a new data block. If so, the pointer value "m_s" will be different, if a new allocation is required. If "rhs" is smaller than the current object, the data block will [probably] be the same.
 

destructor
SIGNATURE: ~textstring_o ()
SYNOPSIS:
virtual destructor: this destroys the class object instance. If the string object allocated the internal buffer via new, delete will be used to de-allocate it.
 

textstring_o(<args>)
SIGNATURE: textstring_o (const string_o &s)
SYNOPSIS: constructor; contents of this object will be the same as that of "s" (deep copy)
 

operator = (textstring_o)
SIGNATURE: textstring_o &operator = (const string_o &s)
SYNOPSIS: copy operator; contents of this object will be the same as that of "s" (deep copy)
 

operator = (textstring_o)
SIGNATURE: textstring_o &operator = (const char *buf)
SYNOPSIS: copy operator; contents of this object will be the same as that of "buf" (deep copy)
 

operator = (textstring_o)
SIGNATURE: textstring_o &operator = (count_t x)
SYNOPSIS:
assignment operator: this function converts a number to a string. the resultant string is a simple representation of the number. If 'set_outnum_format()' was not called prior to this invocation, it will be called with a format string of "%d".
DESCRIPTION:
if you assign an integer value, like so:

    textstring_o s;
    s = 175;
the stored string will be "175", not " 175" or "0175". Use set_outnum_format() to format the number when it's outgoing.
 

operator ==()
SIGNATURE: int operator == (const textstring_o &rhs) const
SYNOPSIS: comparision operator: compares textstring to "rhs". this simply maps onto string_o::operator ==
 

operator !=()
SIGNATURE: int operator != (const textstring_o &) const
SYNOPSIS: comparision operator: compares textstring to "rhs". this simply maps onto string_o::operator !=
 

operator >=()
SIGNATURE: int operator >= (const textstring_o &) const
SYNOPSIS: comparision operator: compares textstring to "rhs". this simply maps onto string_o::operator >=
 

operator <=()
SIGNATURE: int operator <= (const textstring_o &) const
SYNOPSIS: comparision operator: compares textstring to "rhs". this simply maps onto string_o::operator <=
 

operator >()
SIGNATURE: int operator > (const textstring_o &) const
SYNOPSIS: comparision operator: compares textstring to "rhs". this simply maps onto string_o::operator >
 

operator <()
SIGNATURE: int operator < (const textstring_o &) const
SYNOPSIS: comparision operator: compares textstring to "rhs". this simply maps onto string_o::operator <
 

is_numeric()
SIGNATURE: boolean is_numeric (int *pi = NULL) const
SYNOPSIS: check if the current contents of the object's text buffer contains only a number.
PARAMETERS

  • pi: an optional [output] integer error indicator parameter. values:
    0: successful function call
    zErr_Data_Unexpected: encountered a character that is not a '.' or digit
    zErr_DirtyData: unusual character encountered when expecting an integer
  • DESCRIPTION:
    the contents may not contain anything except a valid representation of a number, not even leading or trailing whitespace. examples of valid numeric strings include "-7590", -3.14", "0.0075", "76.", "-0.345", "-.7856", "-1", and "1".
    This function is useful in verifying that some string which should be only a number is that, such as a numeric INI parameter. Region-specific formatting is not allowed, such as comma-separation (eg "125,000").
     

    is_integer()
    SIGNATURE: boolean is_integer (int *pi = NULL) const
    SYNOPSIS: check if the current contents of the object's text buffer contains only an integer number.
    PARAMETERS

  • pi: an optional [output] integer error indicator parameter. values:
    0: successful function call
    zErr_IsEmpty: empty string provided (error!)
    zErr_Data_Incomplete: got a '-' starting it all off, but no digits
  • DESCRIPTION:
    the contents may not contain anything except a valid representation of an integer, not even leading or trailing whitespace. this function is a sibling of is_numeric() (which see). Examples of valid numeric strings include "-7590", -314", "76", "-1", and "1".

     

    set_field_delim()
    SIGNATURE: int set_field_delim (const char c)
    SYNOPSIS:
    specify field delimiter character for yank_field() and eat_field(). The delimiter is restricted to a single character. By default, it is a comma (',').
    TRAITS: this function is inline.
     

    handle_quotes()
    SIGNATURE: int handle_quotes (const boolean ison = TRUE)
    SYNOPSIS:
    turn on checking for quoted strings when processing fields. This function works in conjunction with yank_field() and controls how to process a field in a quote.
    Example.
    The following code loops, pulling out comma-separated fields 3 times:

    #include "z_txtstring.h"
    int main()
    {
        textstring_o s;             // see below
        s.handle_quotes();          // affects behaviour
        string_o chunk;
        int i;
        for (i=0; i < 3; i++)
        {
            chunk = s.yank_field(TRUE);
            std::cout << chunk << std::endl;
        }
    }
    
    If 's', the line being processed by yank_field(), looks like this:
    "How now, brown cow?",45.00,simpleword,
    
    the output will be:
    How now, brown cow?
    45.00
    simpleword
    
    Note that without calling 'handle_quotes()' the output can be very different:
    "How now
    brown cow?"
    45.00
    simpleword
    

    TRAITS: this function is inline.
     

    unhandle_quotes()
    SIGNATURE: int unhandle_quotes ()
    SYNOPSIS:
    turn off checking for quotes (single- or double-) when calling yank_field(). This function is the converse of handle_quotes().
    TRAITS: this function is inline.
     

    setstyle_excel()
    SIGNATURE: int setstyle_excel (const boolean ison = TRUE)
    SYNOPSIS:
    turn on [or off] parsing of quoted strings (see yank_quote()) based on excel's protocols. This function should be invoked prior to calling yank_field().
    See the main discussion on this page for more details and examples.
    TRAITS: this function is inline.
     

    set_columns()
    SIGNATURE: int textstring_o::set_columns (count_t nc, ...)
    SYNOPSIS:
    configures the object for fetching columnar fields (done by calling member function yank_column). This function is a "varargs" (variable argument list) type: the first parameter ('nc') must be followed by 'nc' number of integer arguments (each of type size_t).
    DESCRIPTION:
    this defines columnar fields so that you can fetch fixed-width fields from the object's current line. The object can have multiple lines (eg it can be multi-line: containing blocks of text separated by the '\n' character). The values following 'nc' represent the right wall-boundary of the field (note - NOT the width of the field). Each left boundary starts at 1 character beyond the previous field's value (except the first field, which has 0 as the left boundary). For example:

    textstring_o s("AR3.14Argentina\nGB0.14Great Britain\n");
    textstring_o line, chunk;
    line.set_columns (3, 1, 5, 999);
    
    line = s.yank_line(TRUE); chunk = line.yank_column(0); // "AR" chunk = line.yank_column(1); // "3.14" chunk = line.yank_column(2); // "Argentina"
    line = s.yank_line(TRUE); chunk = line.yank_column(0); // "GB" chunk = line.yank_column(1); // "0.14" chunk = line.yank_column(2); // "Great Britain"

     

    set_outnum_format()
    SIGNATURE: int set_outnum_format (const string_o &fmt = "", int *pi = NULL)
    SYNOPSIS:
    this member function is for when printing out a number that was assigned to the textstring object earlier, like so:

      textstring_o s;
      s.set_outnum_format("%04d");
      s = 75;                       // it will print as "0075"
      s.set_outnum_format("%6.3f"); // set format for real #
      s.set_floatvalue(314.15);     // formatted to "%6.3f"
      s.set_outnum_format("");      // reset - no formatting
      s = x;                        // will print as "75"
    

    This allows simple "printf()-style" formatting of a number. Prior to Z Directory version ZP9.b22, this was used only for integer types, so the 'd', normally at the end, could be omitted. This is no longer true. A limited set of options are currently available:
    • a 'C' for inserting commas every 3 digits in the main part of the number
    • a '-' will left-justify the number, if the field width is large enough
    • a '0' (zero) after the '%' will put zeros in front of the number. This cannot be used in conjunction with 'C' (inserting commas).
    • if decimal (integer), a number indicating the width of the field is required, followed by a 'd'
    • if real (floating-point), a number indicating the width of the field is required; this can be followed by a ddecimal point ('.') and a decimals value. The field is terminated by an 'f' or 'g'.
    The default is an empty string (""), which turns off formatting.
    PARAMETERS
    • fmt: the format string to use. This is a very limited subset of the protocol established by K & R C language printf() family of functions. The currently available components include:
      [COMMA]: this is a literal string "[COMMA]". It cannot be mixed with other options. It results in basic comma separation in the output (eg, 17950365 -> "17,950,365"). It is equivalent to "%Cd".
      0: zero-pad. this cannot be combined with 'C'.
      '-': left-justify the number in its field
      C: include commas to separate into 3-digit batches. this character, and the dash, must be either the 1st or 2nd char after the '%' ("%C-" or "%-C", if both are used). Also, zero-padding is not allowed with commas (eg, no "%C08d")
      [n]d: {where '[n]' is a number} field width, if the type is an integer; or
      [n].[m]f: {'[n]' and 'm' are numbers} field width, followed by an optional fractional width, if the type is a real number. 'f' can be substituted with 'g' - they are the same.
    • pi: an optional [output] integer error indicator parameter. values:
      0: successful function call
      zErr_Param_BadFormat: error in format. possibly the string did not start with a leading '%'; or there were no digits following '%' (or "%0")
      zErr_TooMuchData: floating-point: more than 20 decimal places were specified
      zErr_InsufficientData: got, like "%5.2"
      zErr_Data_Inconsistent: got something like "%5.2d"
    RETURNS:
    0: successful
    -1: error (bad format string)
     

    eat_line()
    SIGNATURE: int eat_line ()
    SYNOPSIS:
    this member function extracts the next line of text from the object, It is exactly equivalent to "yank_line(TRUE, TRUE)". The function is provided as a convenience, to complete the "eat_[xxx]()" set of functions.
    RETURNS: 0 (always)
     

    reset()
    SIGNATURE: int reset()
    SYNOPSIS:
    this function resets the object to its at-construction state. all formatting settings are set to default values and the contents is emptied out. also, set_outnum_format("%d") is called.
    RETURNS: 0 (always)
     

    eat_bigword()
    SIGNATURE: int eat_bigword ()
    SYNOPSIS:
    this function removes any "word" at the start of the object's internal buffer. If the object is empty, or the first character is whitespace, the object is unchanged. A "big word" is a 'word' and any additional non-whitespace immediately following it (see the description on this page).
     

    eat_word()
    SIGNATURE: int eat_word ()
    SYNOPSIS:
    similar to eat_bigword(). Removes any "word" at the start of the internal buffer. If the object is empty, or the first character does not meet the word criteria, the object is unchanged.
     

    eat_whitespace()
    SIGNATURE: int eat_whitespace ()
    SYNOPSIS: expunges any whitespace at the start of the internal text block.
    RETURNS: 0 (always)
     

    eat_eoline()
    SIGNATURE: int eat_eoline (boolean = FALSE)
    SYNOPSIS: if the first characters of the internal text block are newlines, this function will expunge them.
     

    eat()
    SIGNATURE: int eat (const char *p_mark)
    SYNOPSIS:
    given a pointer that points to a position within the internal data block, all characters up to and including that position will be expunged.
     

    eat_nchars()
    SIGNATURE: int eat_nchars (size_t n)
    SYNOPSIS: expunges the first 'n' characters from the internal data block
     

    yank_line()
    SIGNATURE: textstring_o yank_line (boolean eat = FALSE, boolean trim = FALSE)
    SYNOPSIS:
    returns the next line of text from the internal data. If "eat" is TRUE, the line will be expunged. If "trim" is TRUE, the line returned will be trimmed of any trailing whitespace.
     

    yank_quote()
    SIGNATURE: textstring_o yank_quote (boolean eat = FALSE)
    SYNOPSIS:
    returns the next block quote in the internal text block. The first character must be a single- or double- quote, and there must be a corresponding terminating quote character. If "eat" is TRUE, the resultant text will be expunged. See "z_yank_quote()" for more behavioural details (in layer 00 strings). If the internal data fails to produce a valid quote-block, the object is unaffected and an empty string ("") is returned.
     

    yank_bigword()
    SIGNATURE: textstring_o yank_bigword (boolean eat = FALSE)
    SYNOPSIS:
    retrieves the next "big word" from the internal data. If "eat" is TRUE, the resultant text is expunged. See the description section on this page for more information about "big words".
     

    yank_word()
    SIGNATURE: textstring_o yank_word (boolean eat = FALSE)
    SYNOPSIS:
    retrieves the next "word" from the internal data block. If "eat" is TRUE, the resultant text is expunged. See the description section on this page for more information about "words".
     

    yank_nchars()
    SIGNATURE: textstring_o yank_nchars (size_t n, boolean eat = FALSE)
    SYNOPSIS:
    retrieves the next n characters from the internal data block. If "eat" is TRUE, the resultant text is expunged. If the size of the text is less than n characters, only those characters will be returned.
     

    yank_upto()
    SIGNATURE: textstring_o yank_upto (const char c, boolean yank_anyway = TRUE, boolean eat = FALSE)
    SYNOPSIS:
    retrieves characters in the object up to 'c' (if it exists in the object's text buffer, or all the text in the object. block. If "eat" is TRUE, the resultant text is expunged. If the size of the text is less than n characters, only those characters will be returned.
    PARAMETERS

    • c: the character to search for to do the yank. If 'c' is found, it is NOT included in the result string.
    • yank_anyway: This parameter controls what to do if 'c' is not found in the input. if TRUE, the contents of the internal text buffer is returned. If eat is TRUE, the curent object will be emptied.
    • eat: This parameter controls whether to leave the contents of the result in the current object or not. If TRUE, the textstring object will have the text removed from itself; if FALSE, the contents in the internal buffer remain, unmodified.
     

    yank_field()
    SIGNATURE: string_o yank_field (boolean do_munch = FALSE)
    SYNOPSIS:
    this processes the current line, returning all text up to the first field delimiter found, or to end-of-line (if not found).
    DESCRIPTION:
    this is a handy function, used to parse lines with fields separated by a single, specific character. It is used in parsing e-mail headers (where "Content-Type" has a list of semi-colon separated fields, such as charset, format, or reply-type). Use set_field_delim() to set your delimiter character. The default is a comma (',').
    If you want to ignore delimeter characters in quoted fields, use handle_quotes() (which see).
    RETURNS:
    string containing text up to but not including the field delimiter, or up to (but not including) the end of the line if a line terminator was found, or the entire text (if otherwise).
     

    eat_field()
    SIGNATURE: string_o eat_field ()
    SYNOPSIS:
    this is the same as yank_field(TRUE), but the data is not returned. Deletes text from the first character to the first field delimiter (if found), or to e-o-line (if found), or to the last character (if neither found).
    RETURNS: 0 (always)
     

    yank_column()
    SIGNATURE: string_o yank_column (count_t n, int *pi = NULL)
    SYNOPSIS:
    this gets the nth column, starting with column number 0 The object must have its column settings preconfigured with the member function call set_columns().
    PARAMETERS

    • n: the index number of the column to fetch. The first column has 'n' value of 0.
    • pi: [output] error indicator flag variable. values:
      0: success;
      zErr_InsufficientData: the next line is too short
      zErr_Param_BadVal: 'x' is negative
      zErr_OutofBounds: 'x' exceeds the number of columns
      zErr_AtEnd: current string is shorter than start of the column. This is an error condition
      zErr_Data_Truncated: current string shorter than the "right wall". this is a notification only, not an error condition. If the width of the current line results in the current column being fetched to be shorter than its specified right boundary, whatever contents found is returned in the output variable, and this value is set.
    DESCRIPTION:
    this lets you fetch columnar fields of fixed width in the current line. The values of the column left and right wall (ie the indexes of the edges) must be specified prior to calling this function by calling set_columns(). The current line must be long enough to contain at least the first character of the field - otherwise it is considered to be an error. If the current line is shorter than the full columnar field width of the current field being processed, a truncated string (with any endline trimmed off) will be returned, but a "Data Truncated" warning value will be set in the output error variable. It is highly recommended to use the 2nd parameter for error checking.
     

    yank_column()
    SIGNATURE: string_o yank_column (size_t s, size_t e)
     

    pad()
    SIGNATURE: int pad (size_t n, char ch = ' ')
    SYNOPSIS: add trailing chars, up to the width specified by "n". If "ch" is omitted, the character will be a space (ascii 20).
     

    wrap_quotes()
    SIGNATURE: int wrap_quotes (boolean is_single = FALSE, int *pi = NULL)
    SYNOPSIS:
    wrap the current data in quotes, if it isn't already. The member function operation can do both single-quote and double-quote. If the first character and last character already match the quote character, nothing is done and "pi" is set to 1.
    PARAMETERS

    • wid: line width (or, maximum length of "text chunk"). must be
    • is_single: [input] flag; if TRUE, single-quotes will be used instead of double-quotes. The default is FALSE (double- quotes).
    • pi: [output] error indicator flag variable. values:
      0: success;
      1: string is already quote-wrapped;
    TRAITS:
    this function is currently INCOMPLETE in that if there are embedded quote characters, they are not processed. The expected action is to insert a backslash character ('\') is there is none for the character. That is, is the text contains a backslash-quote, the 2 characters are left as-is. Otherwise, a backslash would be added into the text.
    Also, this pretty much duplicates the functionality of the parent-base class object (string_o) member function "enclose()".
     

    wraptext_tolines()
    SIGNATURE: int wraptext_tolines (int wid = 80, string eoline = "\n", int *pi = NULL)
    SYNOPSIS:
    breaks up a line of text into multiple lines of maximum width 'wid'. The object's internal contents is modified after this function. In other words, the object's text is formatted, according to the parameters supplied to this member function.
    PARAMETERS

    • wid: line width (or, maximum length of "text chunk"). must be a positive number.
    • eoline: a set of characters (must be at least 1) that acts as a line terminator (or separator string ). By default, this value is "\n".
    • pi: [output] error indicator flag variable. values:
      0: success;
      zErr_Param_BadVal: width ('wid') <= 0;
      zErr_Param_NotSet: eoline is zero-length;
    DESCRIPTION:
    This function takes the text contained in this object, and if its length exceeds the maximum width (as specified by parameter 'wid') and inserts line breaks so that each line does not exceed 'wid' characters. The end-of-line characters defaults to a NEWLINE character (LF; '\n'), but it can be set to any set of characters (the e-o-line string cannot be empty). Thus, this function can be used for applications other than simple multi-line text wrapping. For example, if an HTML-style line break is desired every 25 characters:
        static const char *my_text = "This is a line that exceeds twenty-five characters in length!";
        int break_my_lines()
        {
            textstring_o ts(my_text);
            ts.wraptext_tolines(25, string_o("<BR/>\n"));
            std::cout << ts;
        }
    
    this will result in the following output:
    This is a line that<BR/>
    exceeds twenty-five<BR/>
    characters in length!<BR/>
    
    Or, if for some reason you simply want to add separators every n characters (or less):
        static const char *my_text = "I am Sam, Sam I am, Do you like green eggs and ham?";
        int break_my_lines()
        {
            textstring_o ts(my_text);
            ts.wraptext_tolines(12, string_o("|//NEXT//|"));
            std::cout << ts;
        }
    
    this will result in the following output - as one long line (broken up here for clarity):
    I am Sam,|//NEXT//|Sam I am, Do|//NEXT//|you like|//NEXT//|
    green eggs|//NEXT//|and ham?|//NEXT//|
    
    Some notes about this function's behaviours:
    • only whitespace serves as the separator.
    • if you want a new-line (CR, LF, or CRLF), you need to explictly add the character into the 'eoline' parameter.
    • up to 'wid' characters (inclusive) are permitted on a given "line". The line-terminator character set (eg the contents of 'eoline') is not factored into the maximum line length.
    • if you have a word (ie "This", "that", hyphenated-word, "((aWORD!_with:punc-Tu-A=tion))") that exceeds the maximum line length, the resultant line will be longer than 'wid' (but it will be shortened to the minimum possible length)
    • the trailing-terminating whitespace is not included in the resultant output lines (or counted as part of the line length.

    TRAITS: AS OF THIS WRITING, THIS MEMBER FUNCTION DOES NOT EXIST
     

    wrapfields_tolines()
    SIGNATURE: int wrapfields_tolines (int wid = 80, char sep = ",", string eoline = "\n", int *pi = NULL)
    SYNOPSIS: this function limits fields separated (terminated) by 'sep', which is a single character, into lines of 'wid' max length.
    PARAMETERS

    • wid: line width (or, maximum length of "text chunk"). must be a positive number.
    • sep: a character (only 1) that acts as a field separator. This is typically used to separate a data "field" (aka column) in a line of text, such as a comma (','), semi-colon (';'), pipe symbol ('|'), or TAB character. The default value is a comma.
    • eoline: a set of characters (must be at least 1) that acts as a separator string or line terminator. By default, this value is "\n".
    • pi: [output] error indicator flag variable. values:
      0: success;
      zErr_Param_BadVal: width ('wid') <= 0;
      zErr_Param_NotSet: sep is null or eoline is zero-length;
    DESCRIPTION:
    this function acts just like wraptext_tolines(), but breaks up fields separated by a delimeter:
        static const char *my_text = "rome,berlin,kabul,london, new york,los angeles,istanbul,tokyo";
        int break_my_lines()
        {
            textstring_o ts(my_text);
            ts.wrapfields_tolines(21, string_o("<P/>\n"));
            std::cout << ts;
        }
    
    this will result in the following output:
    rome,berlin,kabul,<P/>
    london,new york,<P/>
    los angeles,istanbul,<P/>
    tokyo<P/>
    
    All of the behaviours of this function are the same as that of member function wraptext_tolines() (which see), except:
    • the field separator being whatever is specified in parameter 'sep' is used instead of whitespace,
    • the field separator is counted in the maximum line length, and is included at the end of the line.

    TRAITS: AS OF THIS WRITING, THIS MEMBER FUNCTION DOES NOT EXIST
     

    convert_case()
    SIGNATURE: textstring_o convert_case (enum my_case = zstrcase_sentence, boolean perma = TRUE, int *pe = NULL)
    SYNOPSIS: converts all text in the object's internal data to the specified case.
    PARAMETERS

    • my_case: enumeration to specify how to affect the data - text:
      textstring_o::zstrcase_sentence: Make the first character of each sentence upper-case
      textstring_o::zstrcase_title: change each word so that it begins with a capital letter
      textstring_o::zstrcase_upper: all alphabetic characters are converted to upper-case
      textstring_o::zstrcase_lower: all alphabetic characters are converted to lower-case
    • perma: If TRUE, change the internal data block (this is the default value). If FALSE, a textstring object that copies the internal data block will be returned, but the object's actual data will not be modified. In both cases, a string is returned, containing the converted text.
    • pe: [output] error indicator flag variable. Currently this is always set to 0 (no error).
    RETURNS: a text-string object, containing the converted text
     

    as_number()
    SIGNATURE: count_t as_number (int *pi = NULL) const
    SYNOPSIS:
    returns the current contents, as a count_t (a long integer). the first characters in the object's internal buffer must be numeric (1st char can be '-') in order for this routine to succeed.
    PARAMETERS

  • pi: [output] error indicator flag variable. This is 0 if the operation went successfully, or zErr_NotFound if the object is empty or not sitting on numbers at the start of its buffer.
  • RETURNS: the number representing the text. If an error occured, 0 is returned and 'pi' is set to non-zero.
     

    as_real()
    SIGNATURE: double as_real (int *pi = NULL) const
    SYNOPSIS:
    returns the current contents, as a floating-point (double) value. The contents may be preceded by whitespace, but if so, 'pi' will be set to 'zErr_DirtyData'.
    PARAMETERS

  • pi: [output] error indicator flag variable. values:
    0: conversion completed successfully
    zErr_Param_NullPointer: object is not initialized to any value
    zErr_NoData: the contents of the object is an empty string ("")
    zErr_DirtyData: the of the object's data contents had leading whitespace (a warning only)
    zErr_Data_BadFormat: NaN. the first (non-whitespace) char indicated that the contents is not a real number
    zErr_Data_Unexpected: the contents of the object contains non-numeric characters ("45.72(.." is not allowed)
  • RETURNS: the number found in the text, as a double. If an error occured, 0.0 is returned and 'pi' is set to non-zero.
     

    cmp_nchars()
    SIGNATURE: int cmp_nchars (const string_o &s, size_t n = 0, int *pi = NULL) const
    SYNOPSIS:
    compares the first 'n' characters of the current object to another string object. This function provided as a convenience to having to set up pointer for a z_strncmp() operation.
    PARAMETERS

    • s: the string operand to compare to.
    • n: the number of characters to compare. By default, this is 0, which will compare [up to] the number of characters in 's'. If the number of characters in the current object is less than the number of characters in 's', the smaller number will be used. If n is nonzero, the lesser of the sizes of the objects or the value of 'n' will be used. That is, the smallest value of the 3 will be used.
    • pi: [optional output] error indicator flag variable. This is 0 if the comparision was accomplished without problems. It is highly recommended to include this parameter. values:
      0: operation completed successfully
      zErr_Param_NotSet: 's' is zero-length
      zErr_Param_TooBig: 'n' exceeds size of 's'
    DESCRIPTION:
    example:
      string_o s("what");
      textstring_o ts("what is this?");
      int i = ts.cmp_nchars(s);             // returns 0
      i = ts.cmp_nchars(s, 3);              // also returns 0
      s = "who";                            // now s > ts
      i = ts.cmp_nchars(s);                 // returns 1
      i = ts.cmp_nchars(s, 2);              // returns 0
      i = ts.cmp_nchars(s, 3);              // returns 1
      i = ts.cmp_nchars(s);                 // returns 1 (n is 3)
    

    If the 2nd parameter ('n') is omitted, 'n' is set to 0, which is interpreted to mean that the size of 's' (the length of the string 's') is to be used. If that length exceeds the length of the current string, the lesser of the two string lengths will be used as a basis for comparision. In other words, the value of 'n', if the parameter is omitted (and hence, 'pi' is omitted), n will be set to min(size(),s.size()).
     

    operator +=()
    SIGNATURE: int operator += (const char *buf)
    SYNOPSIS: concatenate "buf" to the object's internal data. Exactly equivalent to the same operation in the parent-base class ("string_o").
     

    +operator =()
    SIGNATURE: int operator += (const string_o &rhs)
    SYNOPSIS: concatenate "rhs" to the object's internal data. Exactly equivalent to the same operation in the parent-base class ("string_o").
     

    operator +=()
    SIGNATURE: int operator += (const textstring_o &rhs)
    SYNOPSIS: concatenate "rhs" to the object's internal data. Exactly equivalent to the same operation in the parent-base class ("string_o").
     

    operator +()
    SIGNATURE: textstring_o operator + (const char *)
    SYNOPSIS: concatenate a texstring object with "char *" type.
     

    operator +()
    SIGNATURE: textstring_o operator + (const string_o &)
    SYNOPSIS: concatenate a texstring object with "string_o" type.
     

    operator +()
    SIGNATURE: textstring_o operator + (const textstring_o &)
    SYNOPSIS: concatenate a texstring object with "textstring_o" type.
     

    operator +()
    SIGNATURE: textstring_o operator + (const char *, const textstring_o &)
    SYNOPSIS: concatenate "char *" type with a texstring object. This function provides symmetry (transitive "+").
    TRAITS: this is a friend function
     

     

    z_yanknumber()
    SIGNATURE: textstring_o z_yanknumber (const textstring_o &rhs, const string_o &fmt = "[COMMA]", int *pi = NULL)
    SYNOPSIS:
    this is a global-level function. it allows for fetching a numeric value from the parameter object 'rhs'. The contents at the head of the buffer (ie, characters starting at position 0) must represent a number ("78.0", "-95", "(44.8)", etc).
    PARAMETERS

    • rhs: the string containing the data. it must start with a number in order for this subroutine to succeed. A number, in this case, can start with '-' or be wrapped in parenthesis (eg "(76)") to indicate a negative value.
    • fmt: a string containing a keyword indicating how the output string is to be formatted. The default is "[COMMA]", meaning insert commas. Hence, seven thousand would be printed out as "7,000" with this format mode.
      Alternatively, you can use a subset of the printf()-style format strings. The syntax must be like so:
      '%' [C] [-] ['0'] [n] 'd'
      The items in square-brackets are optional. So, the string must start with a '%' and end with a 'd'. "[n]" means a number - a positive integer. These datums must be in the order listed. Examples: "%C07d" - the field is at least 7 wide, and will be front-padded with zeros, if there is room; and the number will have commas inserted (if the value is 1,000 or greater). Note, '-' and '0' are not both allowed (otherwise, given a value 15 and a format string such as "%-05d", the output would logically have to be "15000", which is clearly incorrect).
    • pi: an optional [output] integer error indicator parameter. values:
      0: successful parse
      zErr_NoData: input source string 'rhs' is empty
      zErr_Param_NullPointer: 'rhs' buffer is null!
      zErr_Require_Failure: the string object ('rhs') does not start with a #
      zErr_Param_NotSet: format specifier is an empty string
      zErr_Data_NotTerminated: expected a width (numeric value) or a terminating 'd'
      zErr_Param_BadFormat: 'fmt' botched (could be due to many things)
      zErr_Data_BadSyntax: 'rhs' - expected digits; got something else
      zErr_ConfigMangled: '-' & '0' both specified (not allowed)
      zErr_Param_TooSmall: string started w '-' or '(', but is short
    DESCRIPTION:
    this function has a lot of formatting options. the formatting parameter controls what kind of format the input text is expected to have. by parameterizing this, it will be possible to support different locales in the future. A value of "[COMMA]" is a code word indicating that the number is to be comma-formatted.
    The format string uses the exact same protocols as for member function 'set_outnum_format()' (which see).
     

    limitations.
    currently there is no way to extract a "sentence". lacking.

    bugs.
    The implemention for many operations in this class often is "heavy", Incurring heavy CPU cost and sometimes large internal memory allocations. Thus this class is not recommended for continuous real-time processing (eg don't use this class in critical military fighter jet control systems).

    Since this class is focused on operations with words, perhaps it should have been called "wordstring_o". Apologies.

    history.

    Tue 04/29/1997: "yank_quote()" - new: do not include quotes
    Mon 11/03/1997: FIX #1 in yank_quote(); found bug in # chars to extract
    Mon 11/24/1997: added "yank_line()"
    Tue 11/25/1997: operator = () - made non-inline, added "_nwidth"
    ??? 08/16/1998: yank_word() - added optimizations; now handling quotes
    Mon 10/26/1998: eat_nchars() - added check for "n == 0" (bug fix)
    Thu 08/11/2011: wrap_quotes() created {--GeG}
    Fri 09/02/2011: FIX #2 yank_quote(TRUE), buff len was 1 too many chars
    Mon 01/28/2013: cmp_nchars() created