=head1 LWPng

This note describe the redesign of the LWP perl modules in order to
add full support for the HTTP/1.1 protocol.  The main change is the
adoption of an event driven framework.  This allows us to support
multiple connections within a single client program.  It was also a
prerequisite for supporting HTTP/1.1 features like persistent
connections and pipelining.


=head1 HTTP/1.1

RFC 2068 is the proposed standard for the Hypertext Transfer Protocol
version 1.1, usually denoted HTTP/1.1.  The document is currently
revised by the IETF and a draft standard document is expected soon??
The latest draft is currently <draft-ietf-http-v11-spec-rev-03.txt>

The HTTP/1.1 protocol use the same basic message format as earlier
versions of the protocol and HTTP/1.1 clients/servers can easily adopt
to peers which only know about the older versions of the protocol.
HTTP/1.1 adds some new methods, some new status codes, and some new
headers.  One important change is that the Host header is now
mandatory.  Other changes is support for partial content and that the
support for caching and proxies has been much improved on.  There is
also a standard mechanism of switching from HTTP/1.1 to some other
(more suitable) protocol on the wire.

The most important change with HTTP/1.1 is the introduction of
persistent connections.  This means that more than one
request/response exchange takes place on a single TCP connection
between a client and a server.  This improves performance and
generally interacts much better with how TCP works underneath.  This
also means that the peers must be able to tell the extent of the
messages on the wire. In HTTP/1.0 the only way to do this was by using
the Content-Length header and by closing the connection (which was
only an option for the server).  Use of the Content-Length header is
not appropriate when the length of the message can not be determined
in advance.  HTTP/1.1 introduce two new ways to delimit messages; the
chunked transfer encoding and self delimiting multipart content types.
The chunked transfer encoding means that the message is broken into
chunks of arbitrary sizes and that each chunk is preceded by a line
specifying the number of bytes in the chunk.  The multipart types use
a special boundary bytepattern as a delimiter for the messages.

With persistent connections one can improve performance even more by
the use of a technique called "pipelining".  This means that the
client sends multiple requests down the connections without waiting
for the response of the first request before sending the second.  This
can have a dramatic effect on the thoughput for high latency
links. [NOTE-pipelining-970624]


=head1 Event driven programming model

Let's investigate what impact the event driven framework has on the
programming model.  The basic model for sending requests and receving
respones used to be:

  $res = $ua->request($req);   # return when response is available
  if ($res->is_success) {
      #...
  }

With the new event driven framework it becomes:

  $ua->spool($req1);   # returns immediately
  $ua->spool($req2);   # can send multiple request in parallel
  #...

  mainloop->run;       # return when all connections are gone

Request objects are created and then handed off to the $ua which will
queue them up for processing.  As you can see, there is no longer any
natural place to test the outcome of the requests.  What happen is
that the requests live their own lives and they will be notified
(though a method call) when the corresponding responses are available.
You, the application programmer, will have to set up event handlers
(in the requests) that react to these events.

Luckily, this does not mean that all old programs must be rewritten.
The following show one way to emulate something very close to the old
behaviour:

  my $res;
  my $req = LWP::Request->new(GET => $url);
  $req->{'done_cb'} = sub { $res = shift; }

  $ua->spool($req);
  mainloop->one_event until $res;

  if ($res->is_success) {
      #...
  }

and this will in fact be used to emulate the old $ua->request() and
$ua->simple_request() interfaces.  The goal is to be able to
completely backwards compatible with the current LWP modules.


=head2 LWP::Request

As you can see from the examples above we use the class name
LWP::Request (as opposed to HTTP::Request) for the requests created.
LWP::Request is a subclass of HTTP::Request, thus it have all the same
methods and attributes as HTTP::Request and then some more.  The most
important of these additions are two callback methods that will be
invoked as the response is received:

   $req->response_data($data, $res);
   $req->response_done($res);

The response_data() callback method is invoked repeatedly as parts of
the content of the response becomes available.  The first time it is
invoked, then $res will be a reference to a HTTP::Response object with
response code and headers initialized, and empty content.  The default
implementation of response_data just appends the data passed to the
content of the $res object.  It also supports a registered callback
function ('data_cb') that will be invoked if defined.

The response_done() callback method is invoked when the whole response
has been received.  It is guaranteed that it will be invoked exactly
once for each request spooled (even if it fails.)  The default
implementation will set up the $res->request and $res->previous links
and will automatically handle redirects and unauthorized responses by
respooling a slightly modified copy of the original requests.  It also
supports a registered callback function ('done_cb') that will invoked
if defined, but only for the last response in case of redirect chains.

As an application programmer you can either subclass LWP::Request, to
provide your own versions of response_data() and response_done(), or
you can just register callback functions.

The LWP::Request object also provide a few more attributes that might
be of interest.  The $req->priority is a number that can be used to
select which request goes first when multiple are spooled at the same
time.  Requests with the smallest numbers go first.  The default
priority happens to be 99.

The $req->proxy attribute tells us if we are going to pass the request
to an proxy server instead of the server implied by the URL.  If
$req->proxy is TRUE, then it should be the URL of the proxy.


=head2 LWP::MainLoop

The event oriented framework is based on a single common object
provided by the LWP::MainLoop module that will watch external IO
descriptors (sockets) and timers.  When events occur, then registered
functions are called and these will call other event handling
functions and so on.

In order for this to work, the mainloop object needs to be in control
when nothing else happens and you especially when you expect protocol
handling to take place.  This is achieved by repeatedly calling the
mainloop->one_event method until we are satisfied.  Each call will
wait until the next event is available, then invoke the corresponding
callback function and then return.  The one_event() interface is handy
because it can be applied recursively and you can set up inner event
loops in event handlers invoked by some outer event loop.

The call mainloop->run is a shorthand for a common form of this loop.
It will call mainloop->one_event until there is no registered IO
handles (sockets) and no timers left.

The following program shows how you can register your own callbacks.
For instance as here the application might want to be able to read
commands from the terminal.

  use LWP::MainLoop qw(mainloop);

  mainloop->readable(\*STDIN, \&read_and_do_cmd);
  mainloop->run;

  sub read_and_do_cmd
  {
     my $cmd;
     my $n = sysread(STDIN, $cmd, 512);
     chomp($cmd);

     if ($cmd eq "q") {
         exit;
     } elsif ($cmd =~ /^(get|head|trace)\s+(\S+)/i) {
         $ua->spool(LWP::Request->new(uc($1) => $2));
     } ...

  }

Currently LWPng use its own private event loop implementation.  The
plan is to adopt the event loop implementation used by the Tk
extention.  This should allow happies mixing of Tk and LWPng.


=head2 LWP::UA

 $ua->spool
 $ua->reschedule
 $ua->stop

The following 'connection parameters' can be adjusted.  You can set
them both at the global level and for each individual server.

=over 12

=item ReqLimit

This is a number indicating how many requests a single
connection will be used for.  When the limit has reached,
then the connection will close itself.  You can get
non-persistent connections by specifying this value
to be 1.

=item ReqPending

How many request can be send on a single connections
before we have to wait for response from the server.
This controls the degree of pipelining.

=item Timeout

For how long will we wait with no activity on the line,
before signaling an error (and closing the connection).

=item IdleTimeout

When the request queue is empty, connections go to
the idle state.  This specify how long before the
connection is closed if no new work arrives.

=back


=head1 Internals overview

For each server that the $ua is going to talk to it maintain a
LWP::Server object.  This object holds a queue of requests not yet
processes.  The $ua->spool() method mainly move the request to the
correct queue.

A LWP::Server can also create one or more LWP::Conn::HTTP objects that
each represent a network connection to the server.  The connection
objects are were all the action takes place.  They will fetch work
(request) from the server queue, talk the network protocol and collect
responses.


 LWP::UA
 LWP::Server
 LWP::Conn:XXX

 URI::Attr
 LWP::StdSched

=head1 LWP::Conn protocol

The LWP::Conn objects conform to the following protocol when
interacting with their manager object (passed in as parameter during
creation).

  $conn = LWP::Conn::XXX->new(MangagedBy => $mgr,
                              Host => $host,
                              Port => $port,
                              ...);

  $conn->activate;
  $conn->stop;

  $mgr->get_request($conn);
  $mgr->pushback_request($conn, @requests);
  $mgr->connection_active($conn);
  $mgr->connection_idle($conn);
  $mgr->connection_closed($conn);

  $req->response_data($data, $res);
  $req->response_done($res);


=head1 LWP::Conn::HTTP

Well, this is a very special module.  You don't usually have designs
where the objects change their class all the time.

You should just know about 3 basic states that a connection object can
be in and then think of writable(), readable() and inactive() as the
three kind of events that we should be prepared to handle at any time.

=over 5

=item 1) Connecting (a non-blocking connect() has been called)

We are waiting for the socket to become writable (which
means that the connect was successful.)  readable
cant happen.  inactive() means failure to connect.

=item 2) Idle (No work to do)

Either work arrive from the application
or the socket will become readable.  When we read
we would expect to get 0 bytes as a signal that the
server has closed the connection.  We don't ask for
the writable event in this state, because we have
nothing to write.

=item 3) Active (sending request(s), receiving header of first request)

If the socket becomes writable we send more request
data until we are done.  If the socket becomes
readable we read data until we have seen a whole
HTTP header and then swith to a (sub-state) depending
on the kind of response we are reading. When a
response has been completely received we go back to
idle if there is not more requests to send.

=back

The following is an attempt on a picture of the state transitions
going on.

     START  ------> Connecting
       |                 |
       |   --->CLOSE     |
       V  /              |
         /               |
     Idle  <-------------+---------------\
       |                                  \
       | (work?)                           \
       |                                    \
       V                                     \
                                              \
     Active (sending request)     <-----------+ (more work?)
       |     \       \      ----\             |
       |      \       \          \            |
       |       \       \          \           |
       |        |       |          |          |
       V        V       V          V          |
                                              |
     ConnLen  Chunked  Multipart  UntilEOF    |   (reading response)
       |         |        |          |        |
       |         |        |          |        |
       +---------+--------+--------- | -------+
                                     |
                                     V
                                   CLOSE


This design was inspired by the "State" pattern described in "Design
Patterns: Elements of Reusable Object-Oriented Software" (Gamma
et.al).  The description of the "State" pattern in this book says:

=over 5

=item Intent

Allow an object to alter its behaviour when its internal state
changes.  The object will appear to change its class.

=item Motivation

Consider a class TCPConnection that represents a network
connection.  A TCPConnection object can be in one of several
different states: Established, Listening, Closed. When a
TCPConnection object receives request from other objects, it
responds differently depending on its current state.  For
example, the effect of an Open request depends on whether the
connection is in its Colsed state or its Establised state.  The
state pattern describes how TCPConnection can exhibit different
behaviour in each state.

=back


Lucky for me Perl is a language that allows me to change the class of
a living object.  That became handy in this situation.

Using classes to describe states also allows a natural description
(and implementation) of substates that behaves like it's base-state
for some events but modify the behaviour for others.
