Files correlati : Commento : Spostamento in libraries delle librerie esterne di Campo per una maggiore pulizia e organizzazione git-svn-id: svn://10.65.10.50/branches/R_10_00@24150 c028cbd2-c16b-5b4b-a496-9718f37d4682
		
			
				
	
	
		
			246 lines
		
	
	
		
			10 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			246 lines
		
	
	
		
			10 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
                                  _   _ ____  _
 | 
						|
                              ___| | | |  _ \| |
 | 
						|
                             / __| | | | |_) | |
 | 
						|
                            | (__| |_| |  _ <| |___
 | 
						|
                             \___|\___/|_| \_\_____|
 | 
						|
 | 
						|
Structs in libcurl
 | 
						|
 | 
						|
This document should cover 7.32.0 pretty accurately, but will make sense even
 | 
						|
for older and later versions as things don't change drastically that often.
 | 
						|
 | 
						|
 1. The main structs in libcurl
 | 
						|
  1.1 SessionHandle
 | 
						|
  1.2 connectdata
 | 
						|
  1.3 Curl_multi
 | 
						|
  1.4 Curl_handler
 | 
						|
  1.5 conncache
 | 
						|
  1.6 Curl_share
 | 
						|
  1.7 CookieInfo
 | 
						|
 | 
						|
==============================================================================
 | 
						|
 | 
						|
1. The main structs in libcurl
 | 
						|
 | 
						|
  1.1 SessionHandle
 | 
						|
 | 
						|
  The SessionHandle handle struct is the one returned to the outside in the
 | 
						|
  external API as a "CURL *". This is usually known as an easy handle in API
 | 
						|
  documentations and examples.
 | 
						|
 | 
						|
  Information and state that is related to the actual connection is in the
 | 
						|
  'connectdata' struct. When a transfer is about to be made, libcurl will
 | 
						|
  either create a new connection or re-use an existing one. The particular
 | 
						|
  connectdata that is used by this handle is pointed out by
 | 
						|
  SessionHandle->easy_conn.
 | 
						|
 | 
						|
  Data and information that regard this particular single transfer is put in
 | 
						|
  the SingleRequest sub-struct.
 | 
						|
 | 
						|
  When the SessionHandle struct is added to a multi handle, as it must be in
 | 
						|
  order to do any transfer, the ->multi member will point to the Curl_multi
 | 
						|
  struct it belongs to. The ->prev and ->next members will then be used by the
 | 
						|
  multi code to keep a linked list of SessionHandle structs that are added to
 | 
						|
  that same multi handle. libcurl always uses multi so ->multi *will* point to
 | 
						|
  a Curl_multi when a transfer is in progress.
 | 
						|
 | 
						|
  ->mstate is the multi state of this particular SessionHandle. When
 | 
						|
  multi_runsingle() is called, it will act on this handle according to which
 | 
						|
  state it is in. The mstate is also what tells which sockets to return for a
 | 
						|
  specific SessionHandle when curl_multi_fdset() is called etc.
 | 
						|
 | 
						|
  The libcurl source code generally use the name 'data' for the variable that
 | 
						|
  points to the SessionHandle.
 | 
						|
 | 
						|
 | 
						|
  1.2 connectdata
 | 
						|
 | 
						|
  A general idea in libcurl is to keep connections around in a connection
 | 
						|
  "cache" after they have been used in case they will be used again and then
 | 
						|
  re-use an existing one instead of creating a new as it creates a significant
 | 
						|
  performance boost.
 | 
						|
 | 
						|
  Each 'connectdata' identifies a single physical connection to a server. If
 | 
						|
  the connection can't be kept alive, the connection will be closed after use
 | 
						|
  and then this struct can be removed from the cache and freed.
 | 
						|
 | 
						|
  Thus, the same SessionHandle can be used multiple times and each time select
 | 
						|
  another connectdata struct to use for the connection. Keep this in mind, as
 | 
						|
  it is then important to consider if options or choices are based on the
 | 
						|
  connection or the SessionHandle.
 | 
						|
 | 
						|
  Functions in libcurl will assume that connectdata->data points to the
 | 
						|
  SessionHandle that uses this connection.
 | 
						|
 | 
						|
  As a special complexity, some protocols supported by libcurl require a
 | 
						|
  special disconnect procedure that is more than just shutting down the
 | 
						|
  socket. It can involve sending one or more commands to the server before
 | 
						|
  doing so. Since connections are kept in the connection cache after use, the
 | 
						|
  original SessionHandle may no longer be around when the time comes to shut
 | 
						|
  down a particular connection. For this purpose, libcurl holds a special
 | 
						|
  dummy 'closure_handle' SessionHandle in the Curl_multi struct to 
 | 
						|
 | 
						|
  FTP uses two TCP connections for a typical transfer but it keeps both in
 | 
						|
  this single struct and thus can be considered a single connection for most
 | 
						|
  internal concerns.
 | 
						|
 | 
						|
  The libcurl source code generally use the name 'conn' for the variable that
 | 
						|
  points to the connectdata.
 | 
						|
 | 
						|
 | 
						|
  1.3 Curl_multi
 | 
						|
 | 
						|
  Internally, the easy interface is implemented as a wrapper around multi
 | 
						|
  interface functions. This makes everything multi interface.
 | 
						|
 | 
						|
  Curl_multi is the multi handle struct exposed as "CURLM *" in external APIs.
 | 
						|
 | 
						|
  This struct holds a list of SessionHandle structs that have been added to
 | 
						|
  this handle with curl_multi_add_handle(). The start of the list is ->easyp
 | 
						|
  and ->num_easy is a counter of added SessionHandles.
 | 
						|
 | 
						|
  ->msglist is a linked list of messages to send back when
 | 
						|
  curl_multi_info_read() is called. Basically a node is added to that list
 | 
						|
  when an individual SessionHandle's transfer has completed.
 | 
						|
 | 
						|
  ->hostcache points to the name cache. It is a hash table for looking up name
 | 
						|
  to IP. The nodes have a limited life time in there and this cache is meant
 | 
						|
  to reduce the time for when the same name is wanted within a short period of
 | 
						|
  time.
 | 
						|
 | 
						|
  ->timetree points to a tree of SessionHandles, sorted by the remaining time
 | 
						|
  until it should be checked - normally some sort of timeout. Each
 | 
						|
  SessionHandle has one node in the tree.
 | 
						|
 | 
						|
  ->sockhash is a hash table to allow fast lookups of socket descriptor to
 | 
						|
  which SessionHandle that uses that descriptor. This is necessary for the
 | 
						|
  multi_socket API.
 | 
						|
 | 
						|
  ->conn_cache points to the connection cache. It keeps track of all
 | 
						|
  connections that are kept after use. The cache has a maximum size.
 | 
						|
 | 
						|
  ->closure_handle is described in the 'connectdata' section.
 | 
						|
 | 
						|
  The libcurl source code generally use the name 'multi' for the variable that
 | 
						|
  points to the Curl_multi struct.
 | 
						|
 | 
						|
 | 
						|
  1.4 Curl_handler
 | 
						|
 | 
						|
  Each unique protocol that is supported by libcurl needs to provide at least
 | 
						|
  one Curl_handler struct. It defines what the protocol is called and what
 | 
						|
  functions the main code should call to deal with protocol specific issues.
 | 
						|
  In general, there's a source file named [protocol].c in which there's a
 | 
						|
  "struct Curl_handler Curl_handler_[protocol]" declared. In url.c there's
 | 
						|
  then the main array with all individual Curl_handler structs pointed to from
 | 
						|
  a single array which is scanned through when a URL is given to libcurl to
 | 
						|
  work with.
 | 
						|
 | 
						|
  ->scheme is the URL scheme name, usually spelled out in uppercase. That's
 | 
						|
  "HTTP" or "FTP" etc. SSL versions of the protcol need its own Curl_handler
 | 
						|
  setup so HTTPS separate from HTTP.
 | 
						|
 | 
						|
  ->setup_connection is called to allow the protocol code to allocate protocol
 | 
						|
  specific data that then gets associated with that SessionHandle for the rest
 | 
						|
  of this transfer. It gets freed again at the end of the transfer. It will be
 | 
						|
  called before the 'connectdata' for the transfer has been selected/created.
 | 
						|
  Most protocols will allocate its private 'struct [PROTOCOL]' here and assign
 | 
						|
  SessionHandle->req.protop to point to it.
 | 
						|
 | 
						|
  ->connect_it allows a protocol to do some specific actions after the TCP
 | 
						|
  connect is done, that can still be considered part of the connection phase.
 | 
						|
 | 
						|
  Some protocols will alter the connectdata->recv[] and connectdata->send[]
 | 
						|
  function pointers in this function.
 | 
						|
 | 
						|
  ->connecting is similarly a function that keeps getting called as long as the
 | 
						|
  protocol considers itself still in the connecting phase.
 | 
						|
 | 
						|
  ->do_it is the function called to issue the transfer request. What we call
 | 
						|
  the DO action internally. If the DO is not enough and things need to be kept
 | 
						|
  getting done for the entire DO sequence to complete, ->doing is then usually
 | 
						|
  also provided. Each protocol that needs to do multiple commands or similar
 | 
						|
  for do/doing need to implement their own state machines (see SCP, SFTP,
 | 
						|
  FTP). Some protocols (only FTP and only due to historical reasons) has a
 | 
						|
  separate piece of the DO state called DO_MORE.
 | 
						|
 | 
						|
  ->doing keeps getting called while issuing the transfer request command(s)
 | 
						|
 | 
						|
  ->done gets called when the transfer is complete and DONE. That's after the
 | 
						|
  main data has been transferred.
 | 
						|
 | 
						|
  ->do_more gets called during the DO_MORE state. The FTP protocol uses this
 | 
						|
  state when setting up the second connection.
 | 
						|
 | 
						|
  ->proto_getsock
 | 
						|
  ->doing_getsock
 | 
						|
  ->domore_getsock
 | 
						|
  ->perform_getsock
 | 
						|
  Functions that return socket information. Which socket(s) to wait for which
 | 
						|
  action(s) during the particular multi state.
 | 
						|
 | 
						|
  ->disconnect is called immediately before the TCP connection is shutdown.
 | 
						|
 | 
						|
  ->readwrite gets called during transfer to allow the protocol to do extra
 | 
						|
  reads/writes
 | 
						|
 | 
						|
  ->defport is the default report TCP or UDP port this protocol uses
 | 
						|
 | 
						|
  ->protocol is one or more bits in the CURLPROTO_* set. The SSL versions have
 | 
						|
  their "base" protocol set and then the SSL variation. Like "HTTP|HTTPS".
 | 
						|
 | 
						|
  ->flags is a bitmask with additional information about the protocol that will
 | 
						|
  make it get treated differently by the generic engine:
 | 
						|
 | 
						|
    PROTOPT_SSL - will make it connect and negotiate SSL
 | 
						|
 | 
						|
    PROTOPT_DUAL - this protocol uses two connections
 | 
						|
 | 
						|
    PROTOPT_CLOSEACTION - this protocol has actions to do before closing the
 | 
						|
    connection. This flag is no longer used by code, yet still set for a bunch
 | 
						|
    protocol handlers.
 | 
						|
  
 | 
						|
    PROTOPT_DIRLOCK - "direction lock". The SSH protocols set this bit to
 | 
						|
    limit which "direction" of socket actions that the main engine will
 | 
						|
    concern itself about.
 | 
						|
 | 
						|
    PROTOPT_NONETWORK - a protocol that doesn't use network (read file:)
 | 
						|
 | 
						|
    PROTOPT_NEEDSPWD - this protocol needs a password and will use a default
 | 
						|
    one unless one is provided
 | 
						|
 | 
						|
    PROTOPT_NOURLQUERY - this protocol can't handle a query part on the URL
 | 
						|
    (?foo=bar)
 | 
						|
 | 
						|
 | 
						|
  1.5 conncache
 | 
						|
 | 
						|
  Is a hash table with connections for later re-use. Each SessionHandle has
 | 
						|
  a pointer to its connection cache. Each multi handle sets up a connection
 | 
						|
  cache that all added SessionHandles share by default.
 | 
						|
 | 
						|
 | 
						|
  1.6 Curl_share
 | 
						|
  
 | 
						|
  The libcurl share API allocates a Curl_share struct, exposed to the external
 | 
						|
  API as "CURLSH *".
 | 
						|
 | 
						|
  The idea is that the struct can have a set of own versions of caches and
 | 
						|
  pools and then by providing this struct in the CURLOPT_SHARE option, those
 | 
						|
  specific SessionHandles will use the caches/pools that this share handle
 | 
						|
  holds.
 | 
						|
 | 
						|
  Then individual SessionHandle structs can be made to share specific things
 | 
						|
  that they otherwise wouldn't, such as cookies.
 | 
						|
 | 
						|
  The Curl_share struct can currently hold cookies, DNS cache and the SSL
 | 
						|
  session cache.
 | 
						|
 | 
						|
  
 | 
						|
  1.7 CookieInfo
 | 
						|
 | 
						|
  This is the main cookie struct. It holds all known cookies and related
 | 
						|
  information. Each SessionHandle has its own private CookieInfo even when
 | 
						|
  they are added to a multi handle. They can be made to share cookies by using
 | 
						|
  the share API.
 |