class ref name: proxy
category-group: networks
layer: 10

synopsis.
The proxy object, or proxy_o, maintains information about proxies. A proxy here refers to one in the computer network context. This object is key to the operation of the webpage_spider_o class. An instance of this class is basically for maintaining information about a specific proxy. Besides the address of a proxy, its most important information is whether it is any good - this frequently changes over time - and how "fast" it is. Another aspect about many proxies is the ability to "cloak" your end of the connection.

Proxies - basic summary info:
A proxy (short for "proxy server") is a computer which serves as a hub through which internet requests are processed. Data can be sent through a proxy (eg, a simple stream of bytes in a connection). If your program connects through one of such servers, your computer sends your requests to the proxy server, which then processes your request and returns the data sent from the other side of the connection. In this way the proxy serves as an intermediary between your home machine and other computers in the network (such as the internet). The reasons for using proxies include filtering of web content, to go around restrictions such as parental or government blocks, or to access something (such as a web server) anonymously.

The transfer rate of the proxy is an important indicator of its quality. Imagine a toll booth on a highway: no matter how fast the vehicles travel, the limiting factor can be how fast the toll booth can process cars.

description.
The proxy object is an orthodox object, so it has a database table structure:

keyword datatype description
network_name varchar(80) this is either the hostname, or the host computer as an IP address.
port int [socket] port number
max_wait int
last_used datetime the last time the proxy was used. This is column is particularly significant to
is_good smallint
ishost_good smallint does the host really exist?
last_conn_ok smallint
last_pull_ok smallint
is_anonymous smallint is the proxy server anonymous? (this is determined by a validation test)
is_transpar smallint
is_https smallint can the proxy be used in https? (this info is provided by the source)
is_google smallint can the proxy access google? (this info is provided by the source)
conn_sec smallint the number of seconds that elapse for a connection to be established with the end target. This value is computed during a proxy validation
pull_sec smallint
cumu_sec int the total number of seconds that elapse during a validation. This value is computed.
num_attempts int "num_[XXX]" type variables are used within the proxyset_fetcher object. It applies to a specific "run" where an application cycles thru a set of proxies. In this case, this is the number of times a proxy has been tried in performing a task.
num_goodfetch int specific run work variable: the number of times this proxy was invoked, successfully connected, and successfully negotiated a round-trip communication with its target.
num_badfetch int specific run work variable: number of times this proxy made a connection, but could not complete a fetch.
num_noconnect int specific run work variable: the number of times the proxy was invoked but failed to connect. For failed proxies, this value will probably be equal to num_attempts.
type varchar(32) this is a very brief descriptoin of the proxy. Typical text string values include "anonymous", "elite proxy", or "transparent".
c_code char(4) country code. This is a 2-character ISO code. A field width of 4 isw provided here for null-termination (if needed by the database), and an extra character allows smoother record field alignment. Use a value of "??" if this field is unknown.
country varchar(20) This is the full country name. This field is provided as extra adornment to the application - 'c_code' is really the value used to identify country of origin of the proxy.
location varchar(40) Any [optional] addiitional location information (besides the country)
lat varchar(16) latitude. This field is not used specifically by the proxy object. It is up to the application to manage the value of this field.
lon varchar(16) longitude. This field is not used specifically by the proxy object. It is up to the application to manage the value of this field.
check_url varchar(24) The web site (aka the URL) used to check the proxy, done within a call to validate().
A proxy is typically written out as "[address]:[port]" (note: without the quotes or brackets). The "address" part is either a regualar computer, fully qualified with its domain name, or an IP address. Here, we refer to this as its "network name" (and access it with member functions network_name() and set_netname()).

Here are some sample proxy addresses (network names):

193.29.2.13:8000
210.155.12.126:8080
212.26.73.31:3128
61.138.130.229:8080
cr2006817124.cable.net.co:8080
jeter.ocs.k12.al.us:1080
proxy.iol.it:8080
213.14.48.227:80
Some common [socket] port numbers include 80, 8080, 3128, 8000, and 21320.

There is a more to what these proxies can do than just be placeholders for data about proxies. the validate() function can test a proxy: whether the host (and port) is valid, whether it can be used, and the connection speed. You can connect to your destination by simply calling the connect() member function, after setting the address. And finally, you can do a "pull()": after connecting, this function will send a string to the target-destination,

The boolean-type columns which are kept in small-integer data types in the proxy table have 3 (not 2) possible values:

  0: item value is "no" or "off";
  1: item value is "yes" or "on";
 -1: item value has not been set (and hence, the value is unknown).

member functions (primary)

proxy_o()
SIGNATURE: proxy_o ()
SYNOPSIS: creates a a new proxy_o object, completely devoid of contents.
 

proxy_o(proxy_o)
SIGNATURE: proxy_o (const proxy_o &rhs)
SYNOPSIS:
creates a a new proxy_o object, based on the contents of proxy_o object "rhs". The contents of proxy "rhs" will be copied ('deep copy').
 

operator = (proxy_o)
SIGNATURE: const proxy_o &operator = (const proxy_o &rhs)
SYNOPSIS: an existing proxy object's contents is replaced with that of "rhs".
 

destructor
SIGNATURE: ~proxy_o ()
SYNOPSIS: virtual destructor: this destroys the class object instance. All of the instance's data will be obliterated.
 

is_valid()
SIGNATURE: boolean is_valid (int *pi) const
SYNOPSIS: tells if the proxy is valid or not. this function makes sense only after doing "validate()".
PARAMETERS

  • pi: error indicator variable. note that a non-zero value of this output parameter does not indicate whether or not the proxy is valid or not. It is set only if an [internal] error occurred. values:
    0: proxy good inside
    1: is-host-good value: illegal
    2: is-good value: illegal
  •  

    is_validhost()
    SIGNATURE: boolean is_validhost (int *pi) const
    SYNOPSIS: (documentation INCOMPLETE)
     

    is_ready()
    SIGNATURE: boolean is_ready (int *pie) const
    SYNOPSIS:
    this routine simply tells if "use_transport()" was done prior. That subroutine must be invoked prior to invoking some functions, including connect() and pull().
     

    name()
    SIGNATURE: string_o name () const
    SYNOPSIS: (documentation INCOMPLETE)
     

    network_name()
    SIGNATURE: string_o network_name (int *pi = NULL) const
    SYNOPSIS: returns the network name of the proxy. This can be an IP address or a domain name.
    PARAMETERS

  • pi: error indicator variable. values:
    0: name is set 1 0: name is not set
  • RETURNS: [string]: [network] name of the proxy
     

    port()
    SIGNATURE: int port (int *pi = NULL) const
    SYNOPSIS: returns the port number
    PARAMETERS

  • pi: error indicator variable. values:
    0: port number is set 1 0: port number is not set
  • RETURNS:
    [n > 0]: port number of the proxy
    -1: error; port number is not set
     

    last_used()
    SIGNATURE: time_o last_used (int *pi) const
    SYNOPSIS:
    returns the time the proxy was last used. Since the default value of the 'time last used' is "undefined" (the value is stored internally as a string), there is a very real possibility that the time value is not set for this. In this case, the application should check the value of the output variable ('pi') to make sure that the value returned is a valid time.
    PARAMETERS

  • pi: error indicator variable. Tells if the time value of the proxy has been set. This integer variable, in this case, acts as a flag. values:
    0: time is set
    1: time is not set
  • RETURNS:
    [time object]: the time the proxy was last used. If the proxy was never used,
    this value will be undefined, and 'pi' will b e set to 1.
     

    num_attempts()
    SIGNATURE: int num_attempts () const
    SYNOPSIS: (documentation INCOMPLETE)
     

    connect_status()
    SIGNATURE: boolean connect_status () const
    SYNOPSIS: (documentation INCOMPLETE)
     

    pull_status()
    SIGNATURE: boolean pull_status () const
    SYNOPSIS:
    tells if the last "pull()" was successful. This is a simple boolean TRUE or FALSE. If not, there are a variety of reasons why it may have failed (connection not initiated, no connection, read or write error), but this routine does not elaborate as to why (use the output error code from pull() for that).
    RETURNS:
    TRUE: the last pull() call succeeded
    FALSE: the last pull() call failed to complete
     

    connect_time()
    SIGNATURE: int connect_time () const
    SYNOPSIS: (documentation INCOMPLETE)
     

    pull_time()
    SIGNATURE: int pull_time () const
    SYNOPSIS: (documentation INCOMPLETE)
     

    cumu_time()
    SIGNATURE: int cumu_time () const
    SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
     

    connect_timeout()
    SIGNATURE: timespan_o connect_timeout () const
    SYNOPSIS: (documentation INCOMPLETE)
     

    pull_timeout()
    SIGNATURE: timespan_o pull_timeout () const
    SYNOPSIS: (documentation INCOMPLETE)
     

    max_waittime()
    SIGNATURE: timespan_o max_waittime () const
    SYNOPSIS: (documentation INCOMPLETE)
     

    mark_validated()
    SIGNATURE: int mark_validated (boolean = TRUE)
    SYNOPSIS: (documentation INCOMPLETE)
     

    clear_validated()
    SIGNATURE: int clear_validated ()
    SYNOPSIS: (documentation INCOMPLETE)
     

    mark_validhost()
    SIGNATURE: int mark_validhost (boolean istrue)
    SYNOPSIS:
    if the host is found to be 'ok', this records the fact in the object instance. if not, set 'istrue' to FALSE, and the fact will be recorded. This function is used to set and to clear the value.
    PARAMETERS

  • istrue: TRUE or FALSE value, sets the internal "good_host" field. Default value is TRUE.
  •  

    set_netname()
    SIGNATURE: int set_netname (const string_o &s)
    SYNOPSIS: set the "address" of the proxy object to 's'.
     

    set_port()
    SIGNATURE: int set_port (int n)
    SYNOPSIS: set the port number of the proxy object to 'n'.
     

    set_maxwait()
    SIGNATURE: int set_maxwait (const timespan_o &)
    SYNOPSIS: (documentation INCOMPLETE)
     

    set_lastused()
    SIGNATURE: int set_lastused (const time_o &t)
    SYNOPSIS:
    sets the "last usage time" of the proxy object to the time found in 't'. This will be written out as a string internally; the format used is defined as that of dbbi_o::TIME_FORMAT.
     

    set_num_attempts()
    SIGNATURE: int set_num_attempts (int x)
    SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
     

    set_connect_status()
    SIGNATURE: int set_connect_status (boolean b)
    SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
     

    set_pull_status()
    SIGNATURE: int set_pull_status (boolean b)
    SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
     

    set_connect_time()
    SIGNATURE: int set_connect_time (int nsec)
    SYNOPSIS:
    sets the time the proxy uses to connect to the target host. Although this is a public function, it is normally intended to be done internally (via function validate())
     

    set_pull_time()
    SIGNATURE: int set_pull_time (int nsec)
    SYNOPSIS:
    sets the time the proxy uses to "pull" data from the target host. A "pull" is interpreted as the time required to write to the target host (such as via an http GET request, ie, "GET http://www.yahoo.com/ HTTP/1.0", and then get a response back.
    Although this is a public function, it is normally intended to be done internally (via function validate())
     

    set_cumu_time()
    SIGNATURE: int set_cumu_time (int nsec)
    SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
     

    set_connect_timeout()
    SIGNATURE: int set_connect_timeout (const timespan_o &tspan)
    SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
     

    set_pull_timeout()
    SIGNATURE: int set_pull_timeout (const timespan_o &tspan)
    SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
     

    reset()
    SIGNATURE: int reset ()
    SYNOPSIS:
    resets the proxy object to its at-default-construction state. all internal information is wiped out, including network name and port number.
    RETURNS: 0
     

    validate()
    SIGNATURE: int validate (const int timeout nsec, int *pie)
    SYNOPSIS: tests the proxy for its ability to connect and "pull" data from the host computer.
    PARAMETERS

  • pi: error indicator variable. note that a non-zero value of this output parameter does not indicate whether or not the proxy is valid or not. It is set only if an [internal] error occurred during this function call. values:
    0: function call succeeded
    zErr_Param_NotSet: host name of proxy not set (use set_DNSname())
    zErr_Resource_Exhausted: "socket()" failed (this is a PANIC - it should never happen).
    zErr_NotFound: host name lookup error (maybe connection is broken or a typo?)
    zErr_NoConnection: could not connect to host
    2: "select()" failed (unix only)
    3: strange error
  •  

    connect()
    SIGNATURE: int connect (int *pi = NULL)
    SYNOPSIS: (documentation INCOMPLETE)
     

    pull()
    SIGNATURE: int pull (const string_o &query, string_o &reply, int *pi = NULL)
    SYNOPSIS:
    this routine does a synchronous data communication with an "endpoint", typically a server socket (such as a web server). It sends the data (in string object variable 'query'), and gets back a reply (at least, that is the plan). Whatever data is returned is put into the output variable 'reply'.
    PARAMETERS

    • query: the [input] text to send to the target
    • reply: the output text, sent back from the target
    • pi: error indicator [output] variable. Values:
      0: pull() completed successfully
      zErr_Param_NotSet: object is not initialized
      zErr_NoConnection: not connected
      zErr_Write_Failed: error in attempting to send data
     

    use_transport()
    SIGNATURE: int use_transport (msgtrans_inetsocket_o *, msgtrans_sockaddr_o *pi)
    SYNOPSIS: (documentation INCOMPLETE)
     

    clone()
    SIGNATURE: proxy_o *clone() const
    SYNOPSIS: makes a copy of the current proxy, and returns a handle (a pointer) to it.
    TRAITS: This is an inline function.
     

    bad_reference()
    SIGNATURE: static proxy_o &bad_reference ()
    SYNOPSIS:
    this static function is provided so that a proxy object can be put inside a Z Directory container. For functions that return a proxy object reference (proxy_o &), if the returned object is invalid (such as from a search for a specific proxy in a pile of proxies), the function must return something - this function is used to return a reference to a global "bad" instance.
    TRAITS: This is a static function.
     

    copy()
    SIGNATURE: int copy (const proxy_o &that)
    SYNOPSIS: makes a copy of 'that'. The current object's data is replaced with that of 'that'.
     

    usage.
    The settings (flags) of any given proxy can be manipulated via member functions:

    flag function notes
    Is_Anonymous mark_anonymous() Allows the proxy code or application to denote the proxy object as being anonymous
    Is_Transparent mark_transparent() Allows the proxy code or application to denote the proxy object as being transparent
    Is_HTTPS mark_https() Allows the proxy code or application to denote the proxy object as being able to prosecute HTTPS connections
    Is_google mark_googly() Allows the proxy code or application to denote the proxy object as being able to access www.google.com
    Validation_Only mark_testonly() if set TRUE, the proxy will be denoted as being used for a validation test run. This affects the database column variables last_used and last_checked. If so marked, and the proxy is being pulled by the proxyset_fetcher object, that object will not set the "last used" variable (since the proxy is not being used, only tested). . Note that this feature is not intented for public consumption. The mark_testonly() function is intended only for proxyset_fetcher_o.

    note.
    In order to store times (that is, dates) in the format required by the underlying database, prior to any database operations you should specify the appropriate format string to the proxy object. Do it with a call to set_dbtimeformat(), as in this example:

    #include "z_dbsubs.h"
    #include "z_proxy.h"
    #include "z_stime.h"
     
    proxy_o proxy;
    stime_o t;
    proxy_o::set_dbtimeformat(zStyle_ODBC_TIMEFORMAT);
     
    // database-oriented proxy operations, eg:
    t.now();                                        // set time, in stime object
    t.set_fmtstring(proxy_o::dbtimeformat());       // set the correct format
    proxy.put ("last_used", t);                     // update the DB field value
    proxy.store_update();                           // writes updates to DB row
    
    As of this writing (2013), there are 2 formats supported for database strings of time and date: ADO and ODBC. The corresponding macros for these that should be passed to proxy_o::set_dbtimeformat() and proxy_o::dbtimeformat() are:
    zStyle_ODBC_TIMEFORMAT
    zStyle_ADO_TIMEFORMAT
    

    examples.
    [proxy example forthcoming]

    
    

    history.

    Tue 02/25/2003: created [--AG]
    Wed 08/08/2012: added, cleanup validate(); BUG FIX: gethostbyname()..
    Thu 08/09/2012: added proxy_o::set_maxwait() [--GeG]