On systems where configure runs, we aim at working on them all - if they have
a suitable C compiler. On systems that don't run configure, we strive to keep
curl running correctly on:
- Windows 98
- AS/400 V5R3M0
- Symbian 9.1
- Windows CE ?
- TPF ?
Build tools
-----------
When writing code (mostly for generating stuff included in release tarballs)
we use a few "build tools" and we make sure that we remain functional with
these versions:
- GNU Libtool 1.4.2
- GNU Autoconf 2.57
- GNU Automake 1.7
- GNU M4 1.4
- perl 5.004
- roffit 0.5
- groff ? (any version that supports "groff -Tps -man [in] [out]")
- ps2pdf (gs) ?
<aname="winvsunix"></a>
Windows vs Unix
===============
There are a few differences in how to program curl the Unix way compared to
the Windows way. Perhaps the four most notable details are:
1. Different function names for socket operations.
In curl, this is solved with defines and macros, so that the source looks
the same in all places except for the header file that defines them. The
macros in use are sclose(), sread() and swrite().
2. Windows requires a couple of init calls for the socket stuff.
That's taken care of by the `curl_global_init()` call, but if other libs
also do it etc there might be reasons for applications to alter that
behaviour.
3. The file descriptors for network communication and file operations are
not as easily interchangeable as in Unix.
We avoid this by not trying any funny tricks on file descriptors.
4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
destroying binary data, although you do want that conversion if it is
text coming through... (sigh)
We set stdout to binary under windows
Inside the source code, We make an effort to avoid `#ifdef [Your OS]`. All
conditionals that deal with features *should* instead be in the format
`#ifdef HAVE_THAT_WEIRD_FUNCTION`. Since Windows can't run configure scripts,
we maintain a `curl_config-win32.h` file in lib directory that is supposed to
look exactly like a `curl_config.h` file would have looked like on a Windows
machine!
Generally speaking: always remember that this will be compiled on dozens of
operating systems. Don't walk on the edge!
<aname="Library"></a>
Library
=======
(See [Structs in libcurl](#structs) for the separate section describing all
major internal structs and their purposes.)
There are plenty of entry points to the library, namely each publicly defined
function that libcurl offers to applications. All of those functions are
rather small and easy-to-follow. All the ones prefixed with `curl_easy` are
put in the lib/easy.c file.
`curl_global_init()` and `curl_global_cleanup()` should be called by the
application to initialize and clean up global stuff in the library. As of
today, it can handle the global SSL initing if SSL is enabled and it can init
the socket layer on windows machines. libcurl itself has no "global" scope.
All printf()-style functions use the supplied clones in lib/mprintf.c. This
makes sure we stay absolutely platform independent.
[ `curl_easy_init()`][2] allocates an internal struct and makes some
initializations. The returned handle does not reveal internals. This is the
`Curl_easy` struct which works as an "anchor" struct for all `curl_easy`
functions. All connections performed will get connect-specific data allocated
that should be used for things related to particular connections/requests.
[`curl_easy_setopt()`][1] takes three arguments, where the option stuff must
be passed in pairs: the parameter-ID and the parameter-value. The list of
options is documented in the man page. This function mainly sets things in
the `Curl_easy` struct.
`curl_easy_perform()` is just a wrapper function that makes use of the multi
API. It basically calls `curl_multi_init()`, `curl_multi_add_handle()`,
`curl_multi_wait()`, and `curl_multi_perform()` until the transfer is done
and then returns.
Some of the most important key functions in url.c are called from multi.c
when certain key steps are to be made in the transfer operation.
<aname="Curl_connect"></a>
Curl_connect()
--------------
Analyzes the URL, it separates the different components and connects to the
remote host. This may involve using a proxy and/or using SSL. The
`Curl_resolv()` function in lib/hostip.c is used for looking up host names
(it does then use the proper underlying method, which may vary between
platforms and builds).
When `Curl_connect` is done, we are connected to the remote site. Then it
is time to tell the server to get a document/file. `Curl_do()` arranges
this.
This function makes sure there's an allocated and initiated 'connectdata'
struct that is used for this particular connection only (although there may
be several requests performed on the same connect). A bunch of things are
inited/inherited from the `Curl_easy` struct.
<aname="Curl_do"></a>
Curl_do()
---------
`Curl_do()` makes sure the proper protocol-specific function is called. The
functions are named after the protocols they handle.
The protocol-specific functions of course deal with protocol-specific
negotiations and setup. They have access to the `Curl_sendf()` (from
lib/sendf.c) function to send printf-style formatted data to the remote
host and when they're ready to make the actual file transfer they call the
`Curl_Transfer()` function (in lib/transfer.c) to setup the transfer and
returns.
If this DO function fails and the connection is being re-used, libcurl will
then close this connection, setup a new connection and re-issue the DO
request on that. This is because there is no way to be perfectly sure that
we have discovered a dead connection before the DO function and thus we
might wrongly be re-using a connection that was closed by the remote peer.
Some time during the DO function, the `Curl_setup_transfer()` function must
be called with some basic info about the upcoming transfer: what socket(s)
to read/write and the expected file transfer sizes (if known).
<aname="Curl_readwrite"></a>
Curl_readwrite()
----------------
Called during the transfer of the actual protocol payload.
During transfer, the progress functions in lib/progress.c are called at
frequent intervals (or at the user's choice, a specified callback might get
called). The speedcheck functions in lib/speedcheck.c are also used to
verify that the transfer is as fast as required.
<aname="Curl_done"></a>
Curl_done()
-----------
Called after a transfer is done. This function takes care of everything
that has to be done after a transfer. This function attempts to leave
matters in a state so that `Curl_do()` should be possible to call again on
the same connection (in a persistent connection case). It might also soon
be closed with `Curl_disconnect()`.
<aname="Curl_disconnect"></a>
Curl_disconnect()
-----------------
When doing normal connections and transfers, no one ever tries to close any
connections so this is not normally called when `curl_easy_perform()` is
used. This function is only used when we are certain that no more transfers
are going to be made on the connection. It can be also closed by force, or
it can be called to make sure that libcurl doesn't keep too many
connections alive at the same time.
This function cleans up all resources that are associated with a single
connection.
<aname="http"></a>
HTTP(S)
=======
HTTP offers a lot and is the protocol in curl that uses the most lines of
code. There is a special file (lib/formdata.c) that offers all the multipart
post functions.
base64-functions for user+password stuff (and more) is in (lib/base64.c) and
all functions for parsing and sending cookies are found in (lib/cookie.c).
HTTPS uses in almost every case the same procedure as HTTP, with only two
exceptions: the connect procedure is different and the function used to read
or write from the socket is different, although the latter fact is hidden in
the source by the use of `Curl_read()` for reading and `Curl_write()` for
writing data to the remote server.
`http_chunks.c` contains functions that understands HTTP 1.1 chunked transfer
encoding.
An interesting detail with the HTTP(S) request, is the `Curl_add_buffer()`
series of functions we use. They append data to one single buffer, and when
the building is finished the entire request is sent off in one single write. This is done this way to overcome problems with flawed firewalls and lame servers.
<aname="ftp"></a>
FTP
===
The `Curl_if2ip()` function can be used for getting the IP number of a
specified network interface, and it resides in lib/if2ip.c.
`Curl_ftpsendf()` is used for sending FTP commands to the remote server. It
was made a separate function to prevent us programmers from forgetting that
they must be CRLF terminated. They must also be sent in one single write() to
make firewalls and similar happy.
<aname="kerberos"></a>
Kerberos
--------
Kerberos support is mainly in lib/krb5.c and lib/security.c but also
`curl_sasl_sspi.c` and `curl_sasl_gssapi.c` for the email protocols and
`socks_gssapi.c` and `socks_sspi.c` for SOCKS5 proxy specifics.
<aname="telnet"></a>
TELNET
======
Telnet is implemented in lib/telnet.c.
<aname="file"></a>
FILE
====
The file:// protocol is dealt with in lib/file.c.
<aname="smb"></a>
SMB
===
The smb:// protocol is dealt with in lib/smb.c.
<aname="ldap"></a>
LDAP
====
Everything LDAP is in lib/ldap.c and lib/openldap.c
<aname="email"></a>
E-mail
======
The e-mail related source code is in lib/imap.c, lib/pop3.c and lib/smtp.c.
<aname="general"></a>
General
=======
URL encoding and decoding, called escaping and unescaping in the source code,
is found in lib/escape.c.
While transferring data in Transfer() a few functions might get used.
`curl_getdate()` in lib/parsedate.c is for HTTP date comparisons (and more).
lib/getenv.c offers `curl_getenv()` which is for reading environment
variables in a neat platform independent way. That's used in the client, but
also in lib/url.c when checking the proxy environment variables. Note that
contrary to the normal unix getenv(), this returns an allocated buffer that
must be free()ed after use.
lib/netrc.c holds the .netrc parser
lib/timeval.c features replacement functions for systems that don't have
gettimeofday() and a few support functions for timeval conversions.
A function named `curl_version()` that returns the full curl version string
is found in lib/version.c.
<aname="persistent"></a>
Persistent Connections
======================
The persistent connection support in libcurl requires some considerations on
how to do things inside of the library.
- The `Curl_easy` struct returned in the [`curl_easy_init()`][2] call
must never hold connection-oriented data. It is meant to hold the root data
as well as all the options etc that the library-user may choose.
- The `Curl_easy` struct holds the "connection cache" (an array of
pointers to 'connectdata' structs).
- This enables the 'curl handle' to be reused on subsequent transfers.
- When libcurl is told to perform a transfer, it first checks for an already
existing connection in the cache that we can use. Otherwise it creates a
new one and adds that to the cache. If the cache is full already when a new
connection is added, it will first close the oldest unused one.
- When the transfer operation is complete, the connection is left
open. Particular options may tell libcurl not to, and protocols may signal
closure on connections and then they won't be kept open, of course.
- When `curl_easy_cleanup()` is called, we close all still opened connections,
unless of course the multi interface "owns" the connections.
The curl handle must be re-used in order for the persistent connections to
work.
<aname="multi"></a>
multi interface/non-blocking
============================
The multi interface is a non-blocking interface to the library. To make that
interface work as well as possible, no low-level functions within libcurl
must be written to work in a blocking manner. (There are still a few spots
violating this rule.)
One of the primary reasons we introduced c-ares support was to allow the name
resolve phase to be perfectly non-blocking as well.
The FTP and the SFTP/SCP protocols are examples of how we adapt and adjust
the code to allow non-blocking operations even on multi-stage command-
response protocols. They are built around state machines that return when
they would otherwise block waiting for data. The DICT, LDAP and TELNET
protocols are crappy examples and they are subject for rewrite in the future
to better fit the libcurl protocol family.
<aname="ssl"></a>
SSL libraries
=============
Originally libcurl supported SSLeay for SSL/TLS transports, but that was then
extended to its successor OpenSSL but has since also been extended to several
other SSL/TLS libraries and we expect and hope to further extend the support
in future libcurl versions.
To deal with this internally in the best way possible, we have a generic SSL
function API as provided by the vtls/vtls.[ch] system, and they are the only
SSL functions we must use from within libcurl. vtls is then crafted to use
the appropriate lower-level function calls to whatever SSL library that is in
use. For example vtls/openssl.[ch] for the OpenSSL library.
<aname="symbols"></a>
Library Symbols
===============
All symbols used internally in libcurl must use a `Curl_` prefix if they're
used in more than a single file. Single-file symbols must be made static.
Public ("exported") symbols must use a `curl_` prefix. (There are exceptions,
but they are to be changed to follow this pattern in future versions.) Public
API functions are marked with `CURL_EXTERN` in the public header files so
that all others can be hidden on platforms where this is possible.
<aname="returncodes"></a>
Return Codes and Informationals
===============================
I've made things simple. Almost every function in libcurl returns a CURLcode,
that must be `CURLE_OK` if everything is OK or otherwise a suitable error
code as the curl/curl.h include file defines. The very spot that detects an
error must use the `Curl_failf()` function to set the human-readable error
description.
In aiding the user to understand what's happening and to debug curl usage, we
must supply a fair number of informational messages by using the
`Curl_infof()` function. Those messages are only displayed when the user
explicitly asks for them. They are best used when revealing information that
isn't otherwise obvious.
<aname="abi"></a>
API/ABI
=======
We make an effort to not export or show internals or how internals work, as
that makes it easier to keep a solid API/ABI over time. See docs/libcurl/ABI
for our promise to users.
<aname="client"></a>
Client
======
main() resides in `src/tool_main.c`.
`src/tool_hugehelp.c` is automatically generated by the mkhelp.pl perl script
to display the complete "manual" and the `src/tool_urlglob.c` file holds the
functions used for the URL-"globbing" support. Globbing in the sense that the
{} and [] expansion stuff is there.
The client mostly sets up its 'config' struct properly, then
it calls the `curl_easy_*()` functions of the library and when it gets back
control after the `curl_easy_perform()` it cleans up the library, checks
status and exits.
When the operation is done, the ourWriteOut() function in src/writeout.c may
be called to report about the operation. That function is using the
`curl_easy_getinfo()` function to extract useful information from the curl
session.
It may loop and do all this several times if many URLs were specified on the
command line or config file.
<aname="memorydebug"></a>
Memory Debugging
================
The file lib/memdebug.c contains debug-versions of a few functions. Functions
such as malloc, free, fopen, fclose, etc that somehow deal with resources
that might give us problems if we "leak" them. The functions in the memdebug
system do nothing fancy, they do their normal function and then log
information about what they just did. The logged data can then be analyzed
after a complete session,
memanalyze.pl is the perl script present in tests/ that analyzes a log file
generated by the memory tracking system. It detects if resources are
allocated but never freed and other kinds of errors related to resource
management.
Internally, definition of preprocessor symbol DEBUGBUILD restricts code which
is only compiled for debug enabled builds. And symbol CURLDEBUG is used to
differentiate code which is _only_ used for memory tracking/debugging.
Use -DCURLDEBUG when compiling to enable memory debugging, this is also
switched on by running configure with --enable-curldebug. Use -DDEBUGBUILD
when compiling to enable a debug build or run configure with --enable-debug.
curl --version will list 'Debug' feature for debug enabled builds, and
will list 'TrackMemory' feature for curl debug memory tracking capable
builds. These features are independent and can be controlled when running
the configure script. When --enable-debug is given both features will be
enabled, unless some restriction prevents memory tracking from being used.
<aname="test"></a>
Test Suite
==========
The test suite is placed in its own subdirectory directly off the root in the
curl archive tree, and it contains a bunch of scripts and a lot of test case
data.
The main test script is runtests.pl that will invoke test servers like
httpserver.pl and ftpserver.pl before all the test cases are performed. The
test suite currently only runs on Unix-like platforms.
You'll find a description of the test suite in the tests/README file, and the
test case data files in the tests/FILEFORMAT file.
The test suite automatically detects if curl was built with the memory
debugging enabled, and if it was, it will detect memory leaks, too.
<aname="asyncdns"></a>
Asynchronous name resolves
==========================
libcurl can be built to do name resolves asynchronously, using either the
normal resolver in a threaded manner or by using c-ares.