category-group: strings
layer: 2
header file(s): z_allstrings.h
libraries: libz00.lib libz01.lib libz02.lib synopsis.
in addition to the major string objects textstring_o and regex_o, layer 2 has a few powerful string functions that should ease text processing. A significant part of this is uuencoding and uudecoding, as well as text to "hex-text" conversion. Also, a new addition is z_quoteprint_to_cleartext(), which converts Quoted-Printable Encoding (commonly used in e-mail message bodies) to regular text. [C] functions (aka subroutines):
z_wraptext()
SIGNATURE: int z_wraptext (string_o &s, size_t n, const string_o &eol = "\n", int *pi = NULL)
SYNOPSIS: does word-wrap on a text string, so that no lines are longer than 'n' characters (if possible).
DESCRIPTION:
this breaks up text into lines that are no longer than 'n' characters (assuming all "words" in the text do not exceed 'n' characters in length). White-space is used to separate the text. If there are "words" (any block of text not containing whitespace, eg, "THIS!"; "just-a-multi-word"; ..<(hyphenated-glob)>"; "!@@..!/") that are longer than 'n', the maximum line length is disregarded, and the word is added to the output text. The text is broken into a new line at the first white-space following the long word.
EXAMPLE:
int ie; string_o s = "The coup against president Carlos Socorras took place before upcoming elections."; ie = z_wraptext (s, 35); std::cout << s << std::endl;This results in this output:The coup against president Carlos Socorras took place before upcoming elections.
z_wrapfield()
SIGNATURE: int z_wrapfield (string_o &s, size_t n, const char c = ',', int *pi = NULL)
SYNOPSIS:
wraps the lines to a maximum of 'n' characters, breaking up excessively long lines according to character c' (which acts as a field delimiter). If no field delimiter is found, the line is made as short as possible but the 'n' character boundary is ignored.
PARAMETERSEXAMPLE:
- s: the string to be processed
- n: maximum line length (including 'c')
- c: the character used to separate text, eg, a field delimiter
- pi: [output] error indicator variable. values:
0: successful parse
zErr_Param_BadVal: width less than 2 (0 or 1)
string_o s = "Chile,Peru,Argentina,Brazil,Paraguay,Bolivia"; z_wrapfield (s, 20); std::cout << s << std::endl;This will print:Chile,Peru, Argentina,Brazil, Paraguay,Bolivia
z_wraptext_in_quotes()
SIGNATURE: int z_wraptext_in_quotes (const string_o &sin, string_o &sot, char q = '"', const boolean doesc = TRUE)
SYNOPSIS: This takes the text in 'sin', and encloses it in the character 'q'. The default wrapping character is a double-quote ('"').
PARAMETERSDESCRIPTION:
- sin: a string containing the text to process
- sot: the output string
- q: the character to wrap the text in
- doesc: if TRUE, double-quotes in the string will be escaped with a back-slash. if FALSE, they will not be affected.
This function has rather hard-coded behaviour focused around the double-quote character. Even though the "wrapper character" can be set to something besides a double-quote, any double-quote characters inside the block of text ('sin') are escaped by a back-slash ('\'). One would be hard-pressed to find a use for this function using a character other than a double-quote.
z_strip_outer_quotes()
SIGNATURE: int z_strip_outer_quotes (const string_o &sin, string_o &sout, boolean is_sq)
SYNOPSIS: This function takes off any [matching!] quotes in a block of text.
PARAMETERSDESCRIPTION: This is the converse of "z_wrap_text_in_quotes()"; it focuses on double-quotes; by default. It also handles single-quotes.
- sin: a string containing the quoted text to process
- sout: the text block from 'sin', stripped of quotes (if any).
- is_sq: if TRUE, and if 'sin' is enveloped by single-quotes ('''), those characters will be stripped. If FALSE, and if the text is wrapped in single-quotes, the text in 'sin' will not be processed and 1 will be returned.
RETURNS:
0: success, did the quote strip
1: nothing to strip, or malformed input
-1: error occurred (panic)
z_str_to_excel()
SIGNATURE: int z_str_to_excel (const string_o &sin, string_o &sot, int *pi = NULL)
SYNOPSIS:
converts text into a format suitable for input into an excel spreadsheet cell. This function is typically used for "manually" constructing a .csv file. Given a string (without the enveloping quotes) such as "Fee, Fi, "FOOBAR!"", the output string would be ""Fee, Fi, ""FOOBAR!"""": the string is wrapped in double-quotes (because there are embedded commas in the text), and any double-quote is escaped with a double-quote - that is, all quotes are replaced with 2 quotes (double-quotes only; single-quotes are not involved).
PARAMETERSRETURNS: 0
- sin: a string containing regular text. It will be processed, getting formatted according to excel formatting rules
- sot: the output block of text based on 'sin', in excel format
- pi: [output] error indicator variable. values:
0: successful
z_insert_leading_tabs()
SIGNATURE: int z_insert_leading_tabs (const string_o &s1, string_o &s2, count_t ntabs = 1)
SYNOPSIS: A nice utility function to insert 1+ tabs in front of each line of the input.
PARAMETERSz_insert_leading_chars()
- s1: input string
- s2: output string
- ntabs: number of tabs to insert in front of each line. default value is 1.
SIGNATURE: int z_insert_leading_chars (const string_o &s1, string_o &s2, char c, count_t n = 1)
SYNOPSIS: A utility function to insert 'n' ocurrances of the character 'c' in front of each line of the input.
PARAMETERSz_uuencode()
- s1: input string
- s2: output string
- c: the character to prepend the lines with
- n: number of times to insert 'c' in front of each line. default value is 1.
SIGNATURE: string_o z_uuencode (const string_o &s, const char *sym = NULL, const char nl = '\n')
SYNOPSIS:
performs uuencoding on string 's'. The uuencoded string is returned as the return value. This makes a multi-line string, where the output lines are readable characters, 61 chars per line.
PARAMETERSs: a string to be uuencoded ("clear-text"). TRAITS: The meaning of parameter "sym" has been lost.
z_uudecode()
SIGNATURE: string_o z_uudecode (const string_o &ss, const char *symbols, const char nl = '\n')
SYNOPSIS:
This subroutine restores a uuencoded string (see z_uuencode()) to its original contents. Note that the contents could be binary (eg non-cleartext).
DESCRIPTION: This subroutine is the converse of z_uuencode()).
z_quoteprint_to_cleartext()
SIGNATURE: int z_quoteprint_to_cleartext(const textstring_o &in, textstring_o &ot, int *pi = NULL)
SYNOPSIS: this subroutine converts quoted-printable encoded (QP) text to normal text.
DESCRIPTION:
converts (decodes) the text in "in" to regular [clear] text. If the string is coming from an e-mail, it should have all boundary information stripped from it.
z_char_to_hexpair()
SIGNATURE: void z_char_to_hexpair (u_char ch, char &s_hi, char &s_lo)
SYNOPSIS: converts a single byte to its 2-byte hex representation
DESCRIPTION:
This subroutine is used by z_str_to_hex(), and is really targeted for exclusive use by that function, but can be used to hex-encode a single character.
The function takes a numerical value (like 20), stored in 'ch', and converts it to a 2-character string containing hexadecimal values that represent the value. If 'ch' is 20, this would be "14". since the max value of a char (that is, unsigned char) is 0xFF, 2 bytes are needed to hold the string representation: one for "F", and the other for "F".
z_hexpair_to_char()
SIGNATURE: u_char z_hexpair_to_char (const char &s_hi, const char &s_lo, u_char &cho, int *pi)
SYNOPSIS:
This function is the converse of z_char_to_hexpair(). It takes the characters in 's_hi' and 's_lo' to form its de-hexed character.
USAGE:
Given a value "0x0A", put '0' -> s_hi (1st parameter) and 'A' -> s_lo (2nd parameter). This will return the value 10 [decimal] in 'cho' (the 3rd parameter) and will return that value, ie, as the functions return value - you have a choice as to from where to get the answer. In this case, the output value can be interpreted as a linefeed ["LF"] character.
PARAMETERSz_str_to_hex()
- s_hi: the high-column value of a hexadecimal number, in its character representation. That is, to pass in a value of 13, put 'C' into this variable.
- s_lo: the low-column value of a hexadecimal number, in its character representation. The meaning is exactly analogous as for 's_hi'.
- cho: the resultant character, a combination of 's_hi' and 's_lo'
- pi: [output] error indicator variable. values:
0: successful
1: the input is not a hex number in character string format (ie, one or both input variables are not of the values ['0'..'F']).
SIGNATURE: int z_str_to_hex (const string_o &s_in, string_o &s_out)
SYNOPSIS:
this converts a string to a series of hex numbers, represented as characters. It uses z_char_to_hexpair() applied to each character in 's_in'.
This function can be considered to be an alternate - substitute for z_uuencode().
z_hex_to_str()
SIGNATURE: int z_hex_to_str (const string_o &shit, string_o &s_out)
SYNOPSIS: This subroutine is the converse of z_str_to_hex(), decoding a hex-encoded string to its original contents.
z_int_to_string()
SIGNATURE: string_o z_int_to_string (int i, char *fmt = NULL)
SYNOPSIS:
A convenience function, formats-prints the value in 'i' as a string. An optional format string, following the standards of the printf() / sprintf() group of functions, can be provided to format the output.
RETURNS: string object containing the text representation of 'i'
zis_isyeson()
SIGNATURE: boolean zis_isyeson (const string_o &s, boolean short_ok)
SYNOPSIS:
this returns TRUE if the input string 's' is one of:
"ON", "On", "on", "YES", "yes", "Yes". This is if 'short_ok' is FALSE. If 'short_ok' is TRUE, add "Y" and "y" to this list.
zis_isnooff()
SIGNATURE: boolean zis_isnooff (const string_o &s, boolean short_ok)
SYNOPSIS:
this returns TRUE if the input string 's' is one of:
"NO", "No", "no", "OFF", "Off", or "off". This is if 'short_ok' is FALSE. If 'short_ok' is TRUE, add "N" and "n" to this list.
zis_yesno()
SIGNATURE: boolean zis_yesno (const string_o &s, boolean short_ok)
SYNOPSIS:
If 'short_ok' is FALSE, this returns TRUE if the input string 's' is one of: "YES", "yes", "Yes", "No", "no". If 'short_ok' is TRUE, include "Y" and "y" in this list.
zis_yes()
SIGNATURE: boolean zis_yes (const string_o &s, boolean short_ok)
SYNOPSIS:
If 'short_ok' is FALSE, this returns TRUE if the input string 's' is one of: "YES", "yes", "Yes". If 'short_ok' is TRUE, "Y" and "y" will also return TRUE.
zis_no()
SIGNATURE: boolean zis_no (const string_o &s, boolean short_ok)
SYNOPSIS:
If 'short_ok' is FALSE, this returns TRUE if the input string 's' is one of: "NO", "No", or "no". If 'short_ok' is TRUE, include "N" and "n" in this list (ie, "NO", "No", "no", "N", "n").
z_get_nth_word()
SIGNATURE: string_o z_get_nth_word (const string_o &str, int no, const string_o &delim)
SYNOPSIS: get the "nth" word from a given string
DESCRIPTION: this little-used subroutine gets the nth word from the given string 'str', delimited by the word separator 'delim'.
TRAITS: obscure function
z_split_str()
SIGNATURE: void z_split_str (const string_o &str, const string_o &delim, vlist_o&words)
SYNOPSIS: this little-used subroutine that "splits" a string and adds the resultant chunks to the output list 'words'.
TRAITS: obscure function
z_list_to_string()
SIGNATURE: string_o z_list_to_string (const vlist_o&list, const string_o &delim)
SYNOPSIS:
This takesa a list of strings inside container object 'list', and concatenates them. The strings are separated by the contents of 'delim'.
It is unclear why this function came into existence. It is rarely (if ever) used nowadays.
TRAITS: obscure function; dubious value
z_random_string()
SIGNATURE: string_o z_random_string (int min_len, int max_len, const char *symbols, int *pexi)
SYNOPSIS:
This function generates a string of minimum length 'min_len' and maximum length 'max_len'. The contents of the string is based on generating a random number.
It is unclear why this function came into existence. It is rarely (if ever) used nowadays.
TRAITS: obscure function
z_str_doublechar_cvt()
SIGNATURE: int z_str_doublechar_cvt (string_o &str, const char ch, int *xpie)
SYNOPSIS:
this function is a generalization of z_str_doublebs_cvt(). It searches for all occurrances of 'ch' in 'str', and converts the characters to 2 of 'ch' in a row.
TRAITS: obscure function
z_str_doublebs_cvt()
SIGNATURE: int z_str_doublebs_cvt (string_o &str, int *xpie)
SYNOPSIS:
This function "doubles" all backslashes ('\') in the string 'str'. This function is rarely (if ever) used lately. The concept behind this function is probably to "escape" backslashes in a string.
TRAITS: obscure function
z_itostr()
SIGNATURE: string_o z_itostr (int i)
SYNOPSIS: this subroutine is exactly equivalent to z_itoa(). Why it exists is unknown; it will probably be deleted soon.
TRAITS: an obscure function. DO NOT USE
z_strtoi()
SIGNATURE: int z_strtoi (const string_o &str)
SYNOPSIS: converts a string to its integer equivalent.
TRAITS: an obscure function. DO NOT USE
zis_allletters()
SIGNATURE: boolean zis_allletters (const string_o &str, int *pi)
SYNOPSIS: returns TRUE if all chars in 'str' are letters, else FALSE
DESCRIPTION: this is the corollary of 'z_is_allnumbers()', provided for symmetry.
z_is_allnumbers()
SIGNATURE: boolean z_is_allnumbers (const string_o &str)
SYNOPSIS: returns TRUE if all chars in 'str' are digits, else FALSE
RETURNS:
1: 'str' contains all digits;
0: otherwise.
zisword_abbreviation()
SIGNATURE: boolean zisword_abbreviation (const string_o &ws, const string_o &lc, int *pi)
SYNOPSIS:
this subroutine tries to guess if the given word (in 'ws') is a common abbreviation. It uses a static list of words. The input to this routine can be in any cases (eg, "Mrs" or "mrs"), and can end in a period ('.') or not (eg "univ.", "Univ.", "Univ").
PARAMETERSDESCRIPTION:
- ws: a string containing a word
- lc: [2-character] language code. currently this is not used - only English language text works.
- pi: [output] error indicator flag-variable. values:
0: all successful, word checked
zErr_Proto_NotHandled: unknown/unhandled language code
zErr_Param_NotSet: input word is blank
zErr_Param_BadFormat; not a word
This function must be taken with grains of salt. It is cpu-intensive, works only for English, and is not complete. Many words can be made into abbreviations and/or recognized as such by anyone: "plz. remem. this fact in the fut." There is absolutely no guarantee of accuracy, this function should be used as a cursory check to see if a word is likely to be an abbreviation. Some liberties are taken here: "prog" can be an abbreviation for program, progressive, programmer, or prognosis; in all cases, it's an abbrevaation. Here are some [loose] rules as to what's included and excluded:
INCLUDED:
- EXCLUDED:
- only common abbreviations are included in the list handled by this function
- slang abbreviations are included only if it can be considered legitimate for a specific [genuine] word: "plz" is often used as short for "please", it's included here even though it's slang and not a true abbreviation.
- *some* semi-proper name acronyms that are in common use and usually not abbreviated with periods interspersed in them, such as GOP, GDP, and GMT.
- titles (and other things) that are normally written with periods embedded: "M.D.", "R.N.", "B.A.", "B.C."
- if the "word" (that is, the group of characters) is an abbreviation and also a word, it is not included. Thus, "cat" and "bus" are excluded: - "in the hat is a cat." - "the gen. cat. of food is 'canned'." - unless context is examined, "cat" cannot be determined
- abbreviations for proper names, like "USA" or "DC" (for "District of Colombia", which can also be "D.C.").
- acronyms. thus, "NAACP", "NASA", and "NATO" are not checked. acronyms grow constantly and are related to proper names
- multi-part (aka "multi-word") abbreviations, like "fl oz"
zis_common_English_word()
SIGNATURE: boolean zis_common_English_word (const string_o &s, count_t max_n = 100, int *pi = NULL)
SYNOPSIS:
returns TRUE if the word in 's' is amongst the most common English words in use. The word must be in lower-case. Up to 500 words can be checked.
PARAMETERS
- s: a string containing a word (all in lower-case), such as "a", "at", "and", "the", and so on.
- max_n: the maximum number of words to check. The checked list is in order of commonality (popularity). The default value of this parameter is 100. Thus, without any further arguments beyond 's', this subrouine will return TRUE if the word is in the 100 most common words in use.
- pi: [output] error indicator variable. values:
0: successful check
zErr_Param_NotSet: 's' is an empty string
zErr_Param_OuttaBounds: 'max_n' > 500. In this case, the search will not be aborted. Rather the value will be set to 500 and hte search will proceed.