class ref name: business_textstream
category-group: people
layer: 11
header file: z_biz_txtstream.h

synopsis.
The business_textstream_o class has a specialized support role. It supports the business object (businesws_textstream_o) and is designed to parse text with a specific format that contains data pertaining to businesses. The text can be either in a file or from standard input, although the latter is almost completely untested. This data is by default loaded into a database. However, you can subclass from this class in order to do other things when given text containing data for a single business.

description.
The "business textstream class" parses text, typically stored in a file, that looks like this:

------------------------------------------------
------------------------------------------------
 
CLIVE SIMMONDS AND ASSOCIATES
18 Ebony Drive
Durban, Natal 4051
South Africa
 
fref:	biz-africa.txt
potent:	040
who:	Guy Owner
		arpa: the.boss@global.co.za
phone:	+27 (83) 449-7554
arpa:	customer_service@clivesbusiness.co.za
url:	http://www.global.co.za/~simmonds
what:	this is text describing what the business does. It can be
	any text - an advertising blurb provided by the business
	or your own critique.
------------------------------------------------
------------------------------------------------
This is a single entry. The file can contain many of such entries, each separated by these double lines:
------------------------------------------------
------------------------------------------------
 
First Business
 
------------------------------------------------
------------------------------------------------
 
Second Business
[optional address]
 
------------------------------------------------
------------------------------------------------
 
Third Business
[optional address]
 
[optional keyword list]
------------------------------------------------
------------------------------------------------

The format of all of this text is specific aqnd strict. There must be exactly 2 lines, and exactly of the length shown here (48 bytes, not including EOL character(s)). Each pair of lines must be followed by 1 blank line, then the name of the business, on its own line. If the name of the business is not known (say you have just a web URL with no organization name listed, but want to create a business entry for it) use the placeholder name "???", or "???[n]", where 'n' is a sequential, ascending, unique number (eg "???[1]", followed by "???[2]", etc). The lines following the business name is the businesse's address. in the country's standard postal address format. For example:

INTERNATIONAL BUSINESS MACHINES
T. J. Watson Research Center
17 Skyline Drive
Hawthorne, NY 10532
The block of lines for the address (which follows immediately after the line containing the business name) is optional. This block - the business name, with or without an address - must be followed by a single blank line. After this blank line comes zero or more "keyword lines". These keyword lines follow the syntax rules of the keyword parser object . Note that all keywords are optional. In the "Clive Simmonds" example given above, the keywords are fref, potent, who, phone, arpa, url, and what.

The primary goal of this object is to load data about businesses into a database. This is done in the virtual member function process_entry(). This function is called for each business to process and is given a chunk of text representing a single business. This function is passed an "action code" which dictates it what to do. The default value is 1, which means add the current business data (passed in to the function) into the database. The action code can be set to any value via member function set_action(). For the actual "base class" implementation of business_textstream_o::process_entry(), the action code is a bitmap and can have the following values:

value description
0 does nothing
1 adds data (current busines) to a database
2 formats the data into databag formatted output
3 saves the data to a database, and formats the data into databag formatted output (ie, a concatenation of bits 0 and 1)
process_entry() is called by process_entries(), which operates on an entire file (or stdin, until E-O-input), and was originally designed to be accessed that way, but you can [now] call it directly with a block of text representing a single business or organizational entity. You can override this function in order to do other things besides populating database tables. See the usage section (below) for details. There is one function for configuring database parameters - config_param(). The values for its parameters are:
keyword value description
dbmethod | dbconnect ODBC | ADO the database access method to use.
server | dbserver [hostname], localhost the database server name
database | dbname [name] the name of the database to access
user | account [name] database user account name
password | pass [text] database user account password
input_file | infile [file-path] | STDIN the name (with path) of the input file, OR the keyword "STDIN" (in any case, or "[STDIN]"), if you want to get the input from the standard input channel (this may be OS-dependent)
output_file | outfile [file-path] | STDOUT the name (with path) of the input file, OR the keyword "STDOUT" (in any case, eg, "stdout", "Stdout"; or "[STDOUT]"), if you want to send the output to the standard output channel (this may be OS-dependent).
wet | run_wet YES, NO if YES, the program will be allowed to do things, thus affecting data. if NO, the progrma will run and go thru its paces but not actually do things (for debugging)
usedb YES, NO if YES, the database is intended to be used.
dups_ok | allowdups YES, NO If YES, duplicate business names will be permitted.
calc_lines YES, NO ?
trace_mode | trace YES, ON, NO, OFF if YES or ON, tracing messages will be displayed (this is currently not implemented, so this is a do-nothing parameter)
show_biz | showbiz YES, NO if set to YES, the business being currently processed will be dumped to stdout (cmd console window)
show_dups | showdups YES, NO if this is set to YES, any duplicate business names will be noted and printed to stdout as they are encountered
abort | abort_onerror YES, NO if set to YES, any error encountered will abort the current processing. If NO, the error will be printed to stdout (depending on volume and possibly other parameters), but processing will continue.
prebag_delim [text] this is used only if databags are to be generated from the input data ("action = 2"). The string supplied as the argument (value) to this keyword is emitted before the databag. Its purpose is to serve as a business entry delimiter. It can be used in conjunction with "postbag_delim" and "dbag_separator" (although it is strange if the latter is also provided). . For actions other than 2, this parameter is ignored.
postbag_delim [text] this is used only if databags are to be generated from the input data ("action = 2"). The string supplied as the argument (value) to this keyword is emitted after the databag. Its purpose is to serve as a business entry delimiter. It can be used in conjunction with "prebag_delim" and "dbag_separator" (although it is strange if the latter is also provided). . For actions other than 2, this parameter is ignored.
dbag_sep | dbag_separator [text] this is used only if databags are to be generated from the input data ("action = 2"). The string supplied as the argument (value) to this keyword is emitted after each databag. Its purpose is to serve as a business entry delimiter. Of the 3 forms of delimiters: prebag_delim, postbag_delim, and dbag_separator, this is expected to be by far the most common. Indeed, the provision for the other two is of questionable value. . For actions other than 2, this parameter is ignored.
min_entry | start_index [number] this sets the offset - index number to start processing at. This value must be an integer not less than 0. A zero value signifies that there is no minimum, ie the 0th entry will be the first one processed (the default). . This parameter is used only when the member function process_entries() is invoked. It does not apply to process_entry().
max_entry | end_index [number] this sets the maximum offset (the business count index number) that processing can occur on. The business corresponding to the value itself is not included. This value must be an integer not less than -1. A value of "-1" signifies that there is no maximum. This is a counter starting from 0. Hence, if this is set to 0, no entries will be processed - the cutoff will be the first business entry fetched. . This parameter is used only when the member function process_entries() is invoked. It does not apply to process_entry().

The business_textstream_o is a heavy object. It includes collections of the 3 groups needed for adding workers and businesses into a database in its internals: arrays of worker_o, person_o, and businesws_textstream_o. The corresponding variables are aptly named workers, and people, and businesses. Thus, the business textstream was intended to be a singleton instance within any given program.

Currently the business_textstream_o class can dump out error messages, but only to stdout (ie a plain old MS-DOS command window console). Ideally, errors would be logged via the error logging components. This is a deficiency that will be corrected in the future (when depends on the object's prioritization). The actual error reporting depends primarily on the volume setting and can be fine-tuned.

member functions (primary)

business_textstream_o()
SIGNATURE: businesws_textstream_o ()
SYNOPSIS: creates a business textstrream object.
 

destructor
SIGNATURE: ~businesws_textstream_o ()
SYNOPSIS: 'destroys' the object: all interal data is reset (to empty); all counters are set to 0; all containers are emptied out.
TRAITS: this function is virtual inline.
 

input_filename()
SIGNATURE: string_o input_filename (int *pi) const
SYNOPSIS: returns the name of the input file, as a string object
TRAITS: this function is inline
 

default_country()
SIGNATURE: string_o default_country () const
SYNOPSIS:
returns the name of the "default country", which is the country used for parsing postal addresses which don't have a country specified. The "default country" is defined by calling 'set_default_country()'
TRAITS: this function is inline
 

is_wet()
SIGNATURE: boolean is_wet () const
SYNOPSIS:
tells if the object is running in "wet mode". This maps onto a flag, set by the flag 'zFlag_BTS_WetMode'. Wet mode is the opposite of a dry run - if set, things will happen when the object is called into action (which is done by calling 'process_entries()')
RETURNS:
TRUE: object is wet
FALSE: object is dry
TRAITS: this function is inline
 

action_code()
SIGNATURE: int action_code () const
SYNOPSIS:
returns an integer code indicating what course of action is to be performed by the object when 'process_entries()' is called. The action code is application defined, if the object is subclassed. The default codes and meanings are:

value meaning
1 add the entry to database
2 convert the entry to databag format
3 add the entry to the database, and convert (and print out) the entry in databag format
Note that the values here are a bitmap, based on the first (ie, lowest) 2 bits.
TRAITS: this function is inline
 

line_count()
SIGNATURE: count_t line_count (boolean cp = TRUE) const
SYNOPSIS: returns a cardinal number that is the current number of lines (the line count) of the input stream (could be a file).
PARAMETERS

  • cp: if TRUE, returns the value of 'my_line_idx'. If FALSE, it returns the value of 'my_line_idx_cp'. What this means is a mystery (sorry)
  • TRAITS: this function is inline
     

    biz_count()
    SIGNATURE: count_t biz_count () const
    SYNOPSIS:
    returns the current number of business processed by 'process_entries()'. This value is maintained in internal counter 'my_biz_idx'.
    TRAITS: this function is inline
     

    volume()
    SIGNATURE: int volume () const
    SYNOPSIS:
    This tells what the volume setting of the object is. The volume level ranges from 0 to 100 and is used to control what messages are emitted to stdout durinng processing. This includes messages for errors. warnings, and notices.
    RETURNS: A cardinal value (an int) in the range [0..100]
    TRAITS: this function is inline
     

    dups_ok()
    SIGNATURE: boolean dups_ok () const
    SYNOPSIS:
    this tells if a duplicate business is allowed to be processed. A business entry is considered to be a "duplicate" if it has the same phone number, URL, or postal address as one that is currently in the database.
    TRAITS: this function is inline
     

    is_trace_on()
    SIGNATURE: boolean is_trace_on () const
    SYNOPSIS: Tells if the object's "tracing mode" is on or off. Trace mode is not used and is probably going to be phased out
    RETURNS:
    TRUE: object is in trace mode
    FALSE: object is not in trace mode
    TRAITS:
    this function is inline. This function, and all others referring to tracing, should be avoided and are slated to be phased out.
     

    show_biz()
    SIGNATURE: boolean show_biz () const
    SYNOPSIS:
    this tells if the object is in "show business" mode. This has nothing to do with the entertainment industry. If in show-biz mode, the business entry will be printed to stdout (ie a console window) each time it is processed. This is independent of the volume control.
    RETURNS:
    TRUE: object is in show-business mode
    FALSE: object is not in show-business mode
    TRAITS: this function is inline
     

    show_dups()
    SIGNATURE: boolean show_dups () const
    SYNOPSIS:
    tells if the object is in "show dups" mode. This mode works closely in conjunction with "dups ok" mode (see member function 'dups_ok()'), and if "show dups" mode is on, if a duplicate is detected during processing (ie, within 'process_entries()'), a short multi-line message of approximately 5 lines will be dumped to stdout. This message will list the name of the existing business that it was matched against, the current business being processed, and the name of the item (phone, URL, etc) that gave it away.
    TRAITS: this function is inline
     

    show_errors()
    SIGNATURE: boolean show_errors () const
    SYNOPSIS:
    Tells if the object is in "show errors" mode. If so, the object will print some extra details if an error is encountered.
    DESCRIPTION: this function appears to be rather arbitrary and not conformant to the ways of the Z Directory.
    TRAITS:
    this function is inline. It is a candidate for being phased out and it, along with any member functions relating to "show errors" mode, may be phased out in the future.
     

    abort_onerror()
    SIGNATURE: boolean abort_onerror () const
    SYNOPSIS: Tells if the object will abort processing if an error is hit.
    DESCRIPTION: this function is independent (orthogonal) from reporting of errors.
    RETURNS:
    TRUE: the object will terminate processing if an error is encountered
    FALSE: the object will soldier on if an error is encountered
    TRAITS: this function is inline
     

    config_param()
    SIGNATURE: int config_param (paramstring_o &ps, int *pi = NULL)
    SYNOPSIS:
    this function is used to configure the object. It should be called prior to 'process_entries()' or 'process_entry()'. However, it can be called at any time to change a parameter setting.
    PARAMETERS

    • ps: a parameter string object containing 0 or more name-value pairs.
    • pi: an optional [output] error indicator variable. values:
      0: success
      zErr_Param_BadVal: the "value" of the parameter(s) is invalid. this is the 2nd half of a name-value pair provided in input string 'ps'. An example of this error is "dbmethod=BADVALUE" or "wet=very" (dbmethod can be only ODBC or ADO; wet must be either ON or OFF, or YES or NO).
    DESCRIPTION:
    this function is important. It is the kitchen sink of setting parameters, and is the only way to define access to a particular database. Please see the chart on this page for a complete listing of the parameters that can be passed to it. This function can be called repeatedly with a single pair, or once with all parameters - or somewhere in between.
     

    set_filename()
    SIGNATURE: int set_filename (const string_o &fnam)
    SYNOPSIS:
    this function sets the input stream to be a file as defined by 'fnam'. Calling this function will put the object automatically into file-processing mode unless the string supplied is "STDIN" or "[STDIN]", in which case input will be taken from stdin.
    PARAMETERS

  • fnam: a string containing the path to a file. This can be just a file name (with extension), or a concatenation of a file name and its path. It should be written in the native format for the underlying OS (eg, with back-slashes in Microsoft-land).
  •  

    set_default_country()
    SIGNATURE: int set_default_country (const string_o &s, int *pi = NULL)
    SYNOPSIS:
    sets the default country, which is used for parsing postal addresses. The default country is set on a per-object basis. See member function default_country() for more info.
     

    set_wetmode()
    SIGNATURE: int set_wetmode (boolean ison = TRUE)
    SYNOPSIS: sets the object into wet mode.
    PARAMETERS

  • ison: if TRUE, the object is set to "wet" (the default). if FALSE, the object is in "dry run" mode.
  • RETURNS: 0
    TRAITS: this function is inline
     

    set_drymode()
    SIGNATURE: int set_drymode (boolean ison = TRUE)
    SYNOPSIS: sets the object into dry-run mode.
    PARAMETERS

  • ison: if TRUE, the object is set to "dry" (the default). if FALSE, the object is in "wet" mode.
  • RETURNS: 0
    TRAITS: this function is inline
     

    set_action()
    SIGNATURE: int set_action (const int x)
    SYNOPSIS:
    this is generally used prior to calling 'process_entries()' or 'process_entry()', and controls what action is to be performed by those functions. see the member functdion action_code() for more information.
    TRAITS: this function is inline
     

    set_volume()
    SIGNATURE: int set_volume (const int v)
    SYNOPSIS: this function sets the volume of the object to 'v'.
    PARAMETERS

  • v: the numeric, integral value of the volume to set the object to. This value must be in [0..100] (inclusive) in order to succeed.
  • RETURNS:
    0: volume successfully set
    -1: v is out of range
    TRAITS: this function is inline
     

    add_lines()
    SIGNATURE: void add_lines (int x, boolean b = TRUE)
    SYNOPSIS: This function is a mystery. Purpose unknown.
    TRAITS: this function is inline
     

    set_lines()
    SIGNATURE: void set_lines ()
    SYNOPSIS:
    As can be seen by the inline code, this copies to value of 'my_line_idx_cp' into 'my_line_idx'. No more information about this function is available.
    TRAITS: this function is inline
     

    set_trace_on()
    SIGNATURE: int set_trace_on ()
    SYNOPSIS: Turns on trace mode in the object.
    TRAITS:
    this function is inline. This function, and all others referring to tracing, should be avoided and are slated to be phased out.
     

    set_trace_off()
    SIGNATURE: int set_trace_off ()
    SYNOPSIS: Turns off trace mode in the object.
    TRAITS:
    this function is inline. This function, and all others referring to tracing, should be avoided and are slated to be phased out.
     

    set_abort()
    SIGNATURE: int set_abort (boolean b = TRUE)
    SYNOPSIS: this turns on "abort mode" - if set, processing will terminate upon encountering an error.
    PARAMETERS

  • b: if TRUE (the default), "abort mode" is turned on. if FALSE, otherwise.
  • RETURNS: 0
    TRAITS: this function is inline
     

    do_showerrs()
    SIGNATURE: int do_showerrs (boolean onoff = TRUE)
    SYNOPSIS: sets "show errors" mode ('onoff' is TRUE) or turns it off (if 'onoff' is FALSE)
    TRAITS: this function is inline
     

    set_errorfile()
    SIGNATURE: int set_errorfile (const file_o &f, int *pi = NULL)
    SYNOPSIS: Sets the file name of the error logging to 'f'.
     

    process_entries()
    SIGNATURE: int process_entries (int *pi = NULL)
    SYNOPSIS: processes the input, from the input stream (stdin or a file).
    PARAMETERS

  • pi: an optional [output] error indicator variable. values:
    0: success
    [n != 0]: an error occured
  • DESCRIPTION:
    There are various actions that can be done with the input. The path taken is determined by the "action" settings, which is set by 'set_action()'.
     

    yank_biz_entry()
    SIGNATURE: int yank_biz_entry (string_o &s)
     

    SYNOPSIS:
    this gets lines of input from the input source and returns the text for 1 complete business/organization entity in the output parameter 's'
    PARAMETERS

  • s: [output] string variable that holds the text of the business.
  • DESCRIPTION:
    This processes text from the input stream, which needs to have been set prior - either as stdin or a file). It then checks for the separator lines (long dash lines that match 'zBizTxt_LineSep' (see z_biz_txtstream.h). When that is found and matches up according to specs, text for the business entity is gathered up to but not including the terminating separator lines pair.
    This function also sets the business name (in internal variable 'my_name'), and increments the business counter.
     

    config()
    SIGNATURE: int config (int *pi = NULL)
    SYNOPSIS:
    this function does all configuration required prior to a run. It is called automatically from 'process_entries()' or 'process_entry()'.
    DESCRIPTION:
    this function:
    - opens the source-input file, if it is closed;
    - opens a database (if the "use database" flag is set - zFlag_BTS_Use_Database);
    - converts any "\n", "\t", or "\r" to their [ASCII] equivalent character, if in "IBS mode" ("Interpret Back-Slash").
     

    cleanup()
    SIGNATURE: int cleanup (int *pi = NULL)
    SYNOPSIS:
    This function does any post-processing clean-up work, such as closing any open files or databases. It can be called explicitly. It is automatically called within 'process_entries()' and 'yank_biz_entry()'.
    PARAMETERS

  • pi: an optional [output] error indicator variable. Its value is [currently] always set to 0.
  •  

    process_entry()
    SIGNATURE: int process_entry (const count_t idx, const string_o &s, int act = 1, int *pi = NULL)
    SYNOPSIS:
    this function processes a single organizational entity. it is basically 'process_entries()' applied to a single business instance (found in 's'). 'process_entries()' calls 'process_entry()' for each business.
    PARAMETERS

    • idx: this is unused. Assign any value to it
    • s: a string object containing the text of 1 entity. The contents of this variable can be supplied by 'yank_biz_entry()'.
    • act: the value of this integer is a bitmap of the actions to perform. If you are subclassing and redefining this function, it is your responsibility to manage the value of this parameter. See the main discussion on this page for the values this parameter can assume in the classes' native form.
    • pi: an optional [output] error indicator variable. values:
    DESCRIPTION: the text of the business-organization is provided in 's', along with the action to perform in 'act'.
    TRAITS: this function is virtual
     

    add_entry_toDB()
    SIGNATURE: int add_entry_toDB (const count_t idx, const string_o &s, int *pi = NULL)
    SYNOPSIS:
    This function parses the contents of the business contained in 's' and stores the data to a database. Access to the database must be set prior. This function is essentially the same as calling 'process_entry()' with the 3rd parameter ('act') set to 1.
    PARAMETERS

    • idx: this is unused. Assign any value to it
    • s: a string object containing the text of 1 entity. The contents of this variable can be supplied by 'yank_biz_entry()'.
    • pi: an defunct, optional [output] error indicator variable. Since error processing is handled by a set of internal variables, this value is a zombie relic. Its value is 0.
     

    print_entry_todbag()
    SIGNATURE: int print_entry_todbag (const count_t idx, const string_o &s, int *pi = NULL)
    SYNOPSIS:
    this function is a sibling of 'add_entry_toDB()'. It applies the contents of 's' and formats the business entity contained in it to databag format, and writes that out to the output file or stdout. No database access or setup is required if using the object for this operation. This function is essentially the same as calling 'process_entry()' with the 3rd parameter ('act') set to 2.
    PARAMETERS

    • idx: this is unused. Assign any value to it
    • s: a string object containing the text of 1 entity. The contents of this variable can be supplied by 'yank_biz_entry()'.
    • pi: an defunct, optional [output] error indicator variable. Since error processing is handled by a set of internal variables, this value is a zombie relic. Its value is 0.
     

    notify_error()
    SIGNATURE: int notify_error (int level, int aux, const flag_o &f, const business_o &biz, const string_o &msg = "")
    SYNOPSIS: this function dumps all sorts of error info to stdout.
    PARAMETERS

    • level: a volume control. this should be one of: zBTS_HardErr, zBTS_Error, zBTS_Warn, zBTS_Vital, zBTS_Stats, or zBTS_Info
    • aux: an auxiliary error code
    • f: a flag to control what gets emitted. Its bit value meanings are:
      0 (1): if set, do & show 'biz.error_info()'
    • biz: a reference to the current business being processed
    • msg: any [optional] error message
    TRAITS: this function is virtual, and hence, you can define your own error notification processing
     

    map_biz_to_bag()
    SIGNATURE: int map_biz_to_bag (const business_o &biz, rec_dbag_o &bag)
    SYNOPSIS: this function converts the data in 'biz' into the databag [output] param 'bag'
    PARAMETERS

    • biz: [input] variable holding a business - organization entity.
    • bag: [output] variable that holds a recursive databag representation of 'biz'
    DESCRIPTION: this is a "near static" function. It is basically a filter, mapping 'biz' to 'bag'.
    TRAITS: this function may be duplicating functionality found elsewhere!
     

    add_workers_to_bag()
    SIGNATURE: int add_workers_to_bag (rec_dbag_o &bag)
    SYNOPSIS:
    this function handles the sub-task of adding the workers from the worker pool (as found in internal variables 'workers' and 'people') for the current business, mapping the data to a databag.
    DESCRIPTION: this function is a "near private" function. It is used by 'map_biz_to_bag()', but may have public uses.
     

    usage.
    There are 2 routes for using this class:
    [1] use it as a singleton, concrete instance to load business entries to a database or generate a databag file, or
    [2] make a subclass from the business_textstream_o class, redefining the process_entry() member function to do your bidding for a given text representation of a business entity.

    The first case ([1]) consists of setting up database access, configuring error handling, and defining input and output files. Here is an example:

    int ie0;
    paramstring_o ps = "dbconnect=ODBC dbserver=localhost database=MYSTUFF";
    ps += "user=root password=MYPASS infile=\"C:\\TEMP\\bizdata.txt\"";
    business_textstream_o x;
     
    x.set_wetmode();        // not necessary, by default
    x.set_volume(50);       // medium level of error logging chatter
    x.config_param(ps);     // sets up database access mainly
    x.set_errorfile("C:\\TEMP\\errors.txt\"");
    x.process_entries();    // the workhorse - this does all
    
    Note that almost all of the coding you need to do is setup. The setup work is alleviated by being able to stuff all the parameters into the "parameter string" object, which includes everything the object needs to open the database, and the path to the datafile to process ("bizdata.txt", in C:\TEMP). This example also sets up an error log file ("errors.txt", also in C:\TEMP).

    bugs.
    the keywords (in the business's keyword list) are not configurable. thie can (and should) be changed in the future. This can create some consternation for those who, say, want to change "arpa" to "email", "url" to "www", or "what" to "descr"; or perhaps want to substitute them with words of other languages.

    history.

    Sat 10/17/1998: 12:54:10 EDT started
    Sun 10/18/1998: 14:20:06 EDT:header file started
    Wed 10/16/2013: add_entry_toDB() member function added