class ref name: proxy
category-group: networks
layer: 10 synopsis.
The proxy object, or proxy_o, maintains information about proxies. A proxy here refers to one in the computer network context. This object is key to the operation of the webpage_spider_o class. An instance of this class is basically for maintaining information about a specific proxy. Besides the address of a proxy, its most important information is whether it is any good - this frequently changes over time - and how "fast" it is. Another aspect about many proxies is the ability to "cloak" your end of the connection. Proxies - basic summary info:
A proxy (short for "proxy server") is a computer which serves as a hub through which internet requests are processed. Data can be sent through a proxy (eg, a simple stream of bytes in a connection). If your program connects through one of such servers, your computer sends your requests to the proxy server, which then processes your request and returns the data sent from the other side of the connection. In this way the proxy serves as an intermediary between your home machine and other computers in the network (such as the internet). The reasons for using proxies include filtering of web content, to go around restrictions such as parental or government blocks, or to access something (such as a web server) anonymously. The transfer rate of the proxy is an important indicator of its quality. Imagine a toll booth on a highway: no matter how fast the vehicles travel, the limiting factor can be how fast the toll booth can process cars. description.
The proxy object is an orthodox object, so it has a database table structure:
A proxy is typically written out as "[address]:[port]" (note: without the quotes or brackets). The "address" part is either a regualar computer, fully qualified with its domain name, or an IP address. Here, we refer to this as its "network name" (and access it with member functions network_name() and set_netname()). Here are some sample proxy addresses (network names):
keyword datatype description network_name varchar(80) this is either the hostname, or the host computer as an IP address. port int [socket] port number max_wait int last_used datetime the last time the proxy was used. This is column is particularly significant to is_good smallint ishost_good smallint does the host really exist? last_conn_ok smallint last_pull_ok smallint is_anonymous smallint is the proxy server anonymous? (this is determined by a validation test) is_transpar smallint is_https smallint can the proxy be used in https? (this info is provided by the source) is_google smallint can the proxy access google? (this info is provided by the source) conn_sec smallint the number of seconds that elapse for a connection to be established with the end target. This value is computed during a proxy validation pull_sec smallint cumu_sec int the total number of seconds that elapse during a validation. This value is computed. num_attempts int "num_[XXX]" type variables are used within the proxyset_fetcher object. It applies to a specific "run" where an application cycles thru a set of proxies. In this case, this is the number of times a proxy has been tried in performing a task. num_goodfetch int specific run work variable: the number of times this proxy was invoked, successfully connected, and successfully negotiated a round-trip communication with its target. num_badfetch int specific run work variable: number of times this proxy made a connection, but could not complete a fetch. num_noconnect int specific run work variable: the number of times the proxy was invoked but failed to connect. For failed proxies, this value will probably be equal to num_attempts. type varchar(32) this is a very brief descriptoin of the proxy. Typical text string values include "anonymous", "elite proxy", or "transparent". c_code char(4) country code. This is a 2-character ISO code. A field width of 4 isw provided here for null-termination (if needed by the database), and an extra character allows smoother record field alignment. Use a value of "??" if this field is unknown. country varchar(20) This is the full country name. This field is provided as extra adornment to the application - 'c_code' is really the value used to identify country of origin of the proxy. location varchar(40) Any [optional] addiitional location information (besides the country) lat varchar(16) latitude. This field is not used specifically by the proxy object. It is up to the application to manage the value of this field. lon varchar(16) longitude. This field is not used specifically by the proxy object. It is up to the application to manage the value of this field. check_url varchar(24) The web site (aka the URL) used to check the proxy, done within a call to validate().
193.29.2.13:8000 210.155.12.126:8080 212.26.73.31:3128 61.138.130.229:8080 cr2006817124.cable.net.co:8080 jeter.ocs.k12.al.us:1080 proxy.iol.it:8080 213.14.48.227:80Some common [socket] port numbers include 80, 8080, 3128, 8000, and 21320. There is a more to what these proxies can do than just be placeholders for data about proxies. the validate() function can test a proxy: whether the host (and port) is valid, whether it can be used, and the connection speed. You can connect to your destination by simply calling the connect() member function, after setting the address. And finally, you can do a "pull()": after connecting, this function will send a string to the target-destination, The boolean-type columns which are kept in small-integer data types in the proxy table have 3 (not 2) possible values:
0: item value is "no" or "off"; 1: item value is "yes" or "on"; -1: item value has not been set (and hence, the value is unknown).member functions (primary)
proxy_o()usage.
SIGNATURE: proxy_o ()
SYNOPSIS: creates a a new proxy_o object, completely devoid of contents.
proxy_o(proxy_o)
SIGNATURE: proxy_o (const proxy_o &rhs)
SYNOPSIS:
creates a a new proxy_o object, based on the contents of proxy_o object "rhs". The contents of proxy "rhs" will be copied ('deep copy').
operator = (proxy_o)
SIGNATURE: const proxy_o &operator = (const proxy_o &rhs)
SYNOPSIS: an existing proxy object's contents is replaced with that of "rhs".
destructor
SIGNATURE: ~proxy_o ()
SYNOPSIS: virtual destructor: this destroys the class object instance. All of the instance's data will be obliterated.
is_valid()
SIGNATURE: boolean is_valid (int *pi) const
SYNOPSIS: tells if the proxy is valid or not. this function makes sense only after doing "validate()".
PARAMETERSpi: error indicator variable. note that a non-zero value of this output parameter does not indicate whether or not the proxy is valid or not. It is set only if an [internal] error occurred. values: is_validhost()
0: proxy good inside
1: is-host-good value: illegal
2: is-good value: illegal
SIGNATURE: boolean is_validhost (int *pi) const
SYNOPSIS: (documentation INCOMPLETE)
is_ready()
SIGNATURE: boolean is_ready (int *pie) const
SYNOPSIS:
this routine simply tells if "use_transport()" was done prior. That subroutine must be invoked prior to invoking some functions, including connect() and pull().
name()
SIGNATURE: string_o name () const
SYNOPSIS: (documentation INCOMPLETE)
network_name()
SIGNATURE: string_o network_name (int *pi = NULL) const
SYNOPSIS: returns the network name of the proxy. This can be an IP address or a domain name.
PARAMETERSpi: error indicator variable. values: RETURNS: [string]: [network] name of the proxy
0: name is set 1 0: name is not set
port()
SIGNATURE: int port (int *pi = NULL) const
SYNOPSIS: returns the port number
PARAMETERSpi: error indicator variable. values: RETURNS:
0: port number is set 1 0: port number is not set
[n > 0]: port number of the proxy
-1: error; port number is not set
last_used()
SIGNATURE: time_o last_used (int *pi) const
SYNOPSIS:
returns the time the proxy was last used. Since the default value of the 'time last used' is "undefined" (the value is stored internally as a string), there is a very real possibility that the time value is not set for this. In this case, the application should check the value of the output variable ('pi') to make sure that the value returned is a valid time.
PARAMETERSpi: error indicator variable. Tells if the time value of the proxy has been set. This integer variable, in this case, acts as a flag. values: RETURNS:
0: time is set
1: time is not set
[time object]: the time the proxy was last used. If the proxy was never used,
this value will be undefined, and 'pi' will b e set to 1.
num_attempts()
SIGNATURE: int num_attempts () const
SYNOPSIS: (documentation INCOMPLETE)
connect_status()
SIGNATURE: boolean connect_status () const
SYNOPSIS: (documentation INCOMPLETE)
pull_status()
SIGNATURE: boolean pull_status () const
SYNOPSIS:
tells if the last "pull()" was successful. This is a simple boolean TRUE or FALSE. If not, there are a variety of reasons why it may have failed (connection not initiated, no connection, read or write error), but this routine does not elaborate as to why (use the output error code from pull() for that).
RETURNS:
TRUE: the last pull() call succeeded
FALSE: the last pull() call failed to complete
connect_time()
SIGNATURE: int connect_time () const
SYNOPSIS: (documentation INCOMPLETE)
pull_time()
SIGNATURE: int pull_time () const
SYNOPSIS: (documentation INCOMPLETE)
cumu_time()
SIGNATURE: int cumu_time () const
SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
connect_timeout()
SIGNATURE: timespan_o connect_timeout () const
SYNOPSIS: (documentation INCOMPLETE)
pull_timeout()
SIGNATURE: timespan_o pull_timeout () const
SYNOPSIS: (documentation INCOMPLETE)
max_waittime()
SIGNATURE: timespan_o max_waittime () const
SYNOPSIS: (documentation INCOMPLETE)
mark_validated()
SIGNATURE: int mark_validated (boolean = TRUE)
SYNOPSIS: (documentation INCOMPLETE)
clear_validated()
SIGNATURE: int clear_validated ()
SYNOPSIS: (documentation INCOMPLETE)
mark_validhost()
SIGNATURE: int mark_validhost (boolean istrue)
SYNOPSIS:
if the host is found to be 'ok', this records the fact in the object instance. if not, set 'istrue' to FALSE, and the fact will be recorded. This function is used to set and to clear the value.
PARAMETERSistrue: TRUE or FALSE value, sets the internal "good_host" field. Default value is TRUE. set_netname()
SIGNATURE: int set_netname (const string_o &s)
SYNOPSIS: set the "address" of the proxy object to 's'.
set_port()
SIGNATURE: int set_port (int n)
SYNOPSIS: set the port number of the proxy object to 'n'.
set_maxwait()
SIGNATURE: int set_maxwait (const timespan_o &)
SYNOPSIS: (documentation INCOMPLETE)
set_lastused()
SIGNATURE: int set_lastused (const time_o &t)
SYNOPSIS:
sets the "last usage time" of the proxy object to the time found in 't'. This will be written out as a string internally; the format used is defined as that of dbbi_o::TIME_FORMAT.
set_num_attempts()
SIGNATURE: int set_num_attempts (int x)
SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
set_connect_status()
SIGNATURE: int set_connect_status (boolean b)
SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
set_pull_status()
SIGNATURE: int set_pull_status (boolean b)
SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
set_connect_time()
SIGNATURE: int set_connect_time (int nsec)
SYNOPSIS:
sets the time the proxy uses to connect to the target host. Although this is a public function, it is normally intended to be done internally (via function validate())
set_pull_time()
SIGNATURE: int set_pull_time (int nsec)
SYNOPSIS:
sets the time the proxy uses to "pull" data from the target host. A "pull" is interpreted as the time required to write to the target host (such as via an http GET request, ie, "GET http://www.yahoo.com/ HTTP/1.0", and then get a response back.
Although this is a public function, it is normally intended to be done internally (via function validate())
set_cumu_time()
SIGNATURE: int set_cumu_time (int nsec)
SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
set_connect_timeout()
SIGNATURE: int set_connect_timeout (const timespan_o &tspan)
SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
set_pull_timeout()
SIGNATURE: int set_pull_timeout (const timespan_o &tspan)
SYNOPSIS: [INCOMPLETE... please notify Vettrasoft]
reset()
SIGNATURE: int reset ()
SYNOPSIS:
resets the proxy object to its at-default-construction state. all internal information is wiped out, including network name and port number.
RETURNS: 0
validate()
SIGNATURE: int validate (const int timeout nsec, int *pie)
SYNOPSIS: tests the proxy for its ability to connect and "pull" data from the host computer.
PARAMETERSpi: error indicator variable. note that a non-zero value of this output parameter does not indicate whether or not the proxy is valid or not. It is set only if an [internal] error occurred during this function call. values: connect()
0: function call succeeded
zErr_Param_NotSet: host name of proxy not set (use set_DNSname())
zErr_Resource_Exhausted: "socket()" failed (this is a PANIC - it should never happen).
zErr_NotFound: host name lookup error (maybe connection is broken or a typo?)
zErr_NoConnection: could not connect to host
2: "select()" failed (unix only)
3: strange error
SIGNATURE: int connect (int *pi = NULL)
SYNOPSIS: (documentation INCOMPLETE)
pull()
SIGNATURE: int pull (const string_o &query, string_o &reply, int *pi = NULL)
SYNOPSIS:
this routine does a synchronous data communication with an "endpoint", typically a server socket (such as a web server). It sends the data (in string object variable 'query'), and gets back a reply (at least, that is the plan). Whatever data is returned is put into the output variable 'reply'.
PARAMETERSuse_transport()
- query: the [input] text to send to the target
- reply: the output text, sent back from the target
- pi: error indicator [output] variable. Values:
0: pull() completed successfully
zErr_Param_NotSet: object is not initialized
zErr_NoConnection: not connected
zErr_Write_Failed: error in attempting to send data
SIGNATURE: int use_transport (msgtrans_inetsocket_o *, msgtrans_sockaddr_o *pi)
SYNOPSIS: (documentation INCOMPLETE)
clone()
SIGNATURE: proxy_o *clone() const
SYNOPSIS: makes a copy of the current proxy, and returns a handle (a pointer) to it.
TRAITS: This is an inline function.
bad_reference()
SIGNATURE: static proxy_o &bad_reference ()
SYNOPSIS:
this static function is provided so that a proxy object can be put inside a Z Directory container. For functions that return a proxy object reference (proxy_o &), if the returned object is invalid (such as from a search for a specific proxy in a pile of proxies), the function must return something - this function is used to return a reference to a global "bad" instance.
TRAITS: This is a static function.
copy()
SIGNATURE: int copy (const proxy_o &that)
SYNOPSIS: makes a copy of 'that'. The current object's data is replaced with that of 'that'.
The settings (flags) of any given proxy can be manipulated via member functions:
note.
flag function notes Is_Anonymous mark_anonymous() Allows the proxy code or application to denote the proxy object as being anonymous Is_Transparent mark_transparent() Allows the proxy code or application to denote the proxy object as being transparent Is_HTTPS mark_https() Allows the proxy code or application to denote the proxy object as being able to prosecute HTTPS connections Is_google mark_googly() Allows the proxy code or application to denote the proxy object as being able to access www.google.com Validation_Only mark_testonly() if set TRUE, the proxy will be denoted as being used for a validation test run. This affects the database column variables last_used and last_checked. If so marked, and the proxy is being pulled by the proxyset_fetcher object, that object will not set the "last used" variable (since the proxy is not being used, only tested). . Note that this feature is not intented for public consumption. The mark_testonly() function is intended only for proxyset_fetcher_o.
In order to store times (that is, dates) in the format required by the underlying database, prior to any database operations you should specify the appropriate format string to the proxy object. Do it with a call to set_dbtimeformat(), as in this example:
As of this writing (2013), there are 2 formats supported for database strings of time and date: ADO and ODBC. The corresponding macros for these that should be passed to proxy_o::set_dbtimeformat() and proxy_o::dbtimeformat() are:#include "z_dbsubs.h" #include "z_proxy.h" #include "z_stime.h" proxy_o proxy; stime_o t; proxy_o::set_dbtimeformat(zStyle_ODBC_TIMEFORMAT); // database-oriented proxy operations, eg: t.now(); // set time, in stime object t.set_fmtstring(proxy_o::dbtimeformat()); // set the correct format proxy.put ("last_used", t); // update the DB field value proxy.store_update(); // writes updates to DB row
examples.zStyle_ODBC_TIMEFORMAT zStyle_ADO_TIMEFORMAT
[proxy example forthcoming]
history.
Tue 02/25/2003: created [--AG] Wed 08/08/2012: added, cleanup validate(); BUG FIX: gethostbyname().. Thu 08/09/2012: added proxy_o::set_maxwait() [--GeG]