NAME
    Net::Traces::TSH - Analyze IP traffic traces in TSH format

SYNOPSIS
      use Net::Traces::TSH qw(:traffic_analysis);

      # Display progress indicators
      #
      verbose;

      # Process the trace in file some_trace.tsh
      #
      process_trace 'some_trace.tsh';

      # Then, write a summary of the trace contents to some_trace.csv, in
      # comma-separated values (CSV) format
      #
      write_trace_summary 'some_trace.csv';

INSTALLATION
    "Net::Traces::TSH" can be installed like any CPAN module. In particular,
    consider using Andreas Koenig's CPAN module for all your CPAN module
    installation needs.

    To find out more about installing CPAN modules type

     perldoc perlmodinstall

    at the command prompt.

    If you have already downloaded the "Net::Traces::TSH" tarball,
    decompress and untar it, and proceed as follows:

     perl Makefile.PL
     make
     make test
     make install

DESCRIPTION
    With "Net::Traces::TSH" you can analyze Internet Protocol (IP) packet
    traces in time sequenced headers (TSH), a binary network trace format.
    Daily TSH traces are available from the NLANR PMA web site. Each 44-byte
    TSH record corresponds to an IP packet passing by a monitoring point.
    Although there are no explicit delimiters, each record is composed of
    three sections.

    Time and Interface
        The first section uses 8 bytes to store the time (with microsecond
        granularity) and the interface number of the corresponding packet,
        as recorded by the (passive) monitor.

    IP  The next 20 bytes contain the standard IP packet header. IP options
        are not recorded.

    TCP The third and last section contains the first 16 bytes of the
        standard TCP segment header. The TCP checksum, urgent pointer, and
        TCP options (if any) are not included in a TSH record.

    If a record does not correspond to a TCP segment, it is not clear how to
    interpret the last section. As such, "Net::Traces::TSH" makes no
    assumptions, and does not analyze the last section of a TSH record
    unless it corresponds to a TCP segment. In other words,
    "Net::Traces::TSH" reports on protocols other than TCP based solely on
    the first two sections.

    The following diagram illustrates a TSH record.

         0                   1                   2                   3
         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1  Section
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      0 |                      Timestamp (seconds)                      | Time
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      1 | Interface  No.|          Timestamp (microseconds)             |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      2 |Version|  IHL  |Type of Service|          Total Length         | IP
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      3 |         Identification        |Flags|      Fragment Offset    |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      4 |  Time to Live |    Protocol   |         Header Checksum       |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      5 |                       Source Address                          |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      6 |                    Destination Address                        |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      7 |          Source Port          |       Destination Port        | TCP
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      8 |                        Sequence Number                        |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      9 |                    Acknowledgment Number                      |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |  Data |       |C|E|U|A|P|R|S|F|                               |
     10 | Offset|RSRV-ed|W|C|R|C|S|S|Y|I|            Window             |
        |       |       |R|E|G|K|H|T|N|N|                               |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    This diagram is an adaptation of the original TSH diagram (found on the
    NLANR PMA web site), which reflects the changes due to the addition of
    Explicit Congestion Notification (ECN) in the TCP header flags. Keep in
    mind that recent Internet Engineering Task Force (IETF) Requests for
    Comments (RFCs) have deprecated the IP header Type of Service field in
    favor of Differentiated Services and Explicit Congestion Notification.

    For example, you can use "Net::Traces::TSH" to gather information from a
    TSH packet trace, perform statistical analysis on Transport protocol,
    Differentiated Services (DiffServ) and ECN usage, and obtain packet and
    segment size distributions. The trace summary statistics are stored in
    comma separated values (CSV), a platform independent text format.

  Data Structures
    The data collected from a trace is stored is a hash called %Trace, the
    main data structure in "Net::Traces::TSH". %Trace is initialized and
    populated by process_trace. The recommended way to get the trace summary
    information is by calling write_trace_summary, which stores the contents
    of %Trace in a CSV-formated text file, as shown in SYNOPSIS.

    %Trace is not exported by default and was not designed to be accessed
    directly by user code. However, if you know what you are doing, you can
    get a reference to %Trace by calling get_trace_summary_href. If you
    choose to do so, the following subsections explain how you can access
    some of the information stored in %Trace. See also Taking advantage of
    %Trace.

   General Trace Information
    $Trace{filename}
        The trace FILENAME.

    $Trace{summary}
        The trace summary FILENAME.

    $Trace{starts}
        The first trace timestamp, in seconds, as it is recorded in the
        trace.

    $Trace{ends}
        The last trace timestamp, in seconds, after being "normalized".
        Essentially, the number of seconds since $Trace{starts}.

    $Trace{records}
        Number of records in the trace.

    $Trace{unidirectional}
        True, if each interface carries unidirectional traffic.

        False, if there is bidirectional traffic in at least one interface.

        "undef" if traffic directionality was not examined.

    $Trace{'Link Capacity'}
        The capacity of the monitored link in bits per second (b/s).

   Internet Protocol
    $Trace{IP}{'Total Packets'}
    $Trace{IP}{'Total Bytes'}
        Number of IP packets and bytes, respectively, in the trace. The
        number of IP packets should equal the number of records in the
        trace.

   Fragmentation
    $Trace{IP}{'DF Packets'}
    $Trace{IP}{'DF Bytes'}
        Number of IP packets and bytes, respectively, requesting no
        fragmentation ('Do not Fragment').

    $Trace{IP}{'MF Packets'}
    $Trace{IP}{'MF Bytes'}
        Number of IP packets and bytes, respectively, indicating that 'More
        Fragments' follow.

   Differentiated Services
    $Trace{IP}{'Normal Packets'}
    $Trace{IP}{'Normal Bytes'}
        Number of IP packets and bytes, respectively, requesting no
        particular treatment (best effort traffic). None of the DiffServ and
        ECN bits are set.

    $Trace{IP}{'Class Selector Packets'}
    $Trace{IP}{'Class Selector Bytes'}
        Number of IP packets and bytes, respectively, with Class Selector
        bits set.

    $Trace{IP}{'AF PHB Packets'}
    $Trace{IP}{'AF PHB Bytes'}
        Number of IP packets and bytes, respectively, requesting Assured
        Forwarding Per-Hop Behavior (PHB).

    $Trace{IP}{'EF PHB Packets'}
    $Trace{IP}{'EF PHB Bytes'}
        Number of IP packets and bytes, respectively, requesting Expedited
        Forwarding Per-Hop Behavior (PHB)

   Explicit Congestion Notification
    $Trace{IP}{'ECT Packets'}
    $Trace{IP}{'ECT Bytes'}
        Number of IP packets and bytes, respectively, with either of the ECT
        bits set. These packets should be carrying ECN-aware traffic.

    $Trace{IP}{'CE Packets'}
    $Trace{IP}{'CE Bytes'}
        Number of IP packets and bytes, respectively, with the CE bit set.
        These packets carry ECN-capable traffic that has been marked at an
        ECN-aware router.

   Transport Protocols
    Besides the summary information about the trace itself and statistics
    about IP, %Trace maintains information about the transport protocols
    present in the trace. Based on the IP header, %Trace maintains the same
    statistics mentioned in the previous section for TCP, UDP and other
    transport protocols with an IANA assigned number. For example,

    $Trace{Transport}{TCP}{'Total Packets'}
    $Trace{Transport}{TCP}{'Total Bytes'}
        Number of TCP segments and the corresponding bytes (including the IP
        and TCP headers) in the trace.

    $Trace{Transport}{UDP}{'Total Packets'}
    $Trace{Transport}{UDP}{'Total Bytes'}
        Ditto, for UDP.

    $Trace{Transport}{ICMP}{'DF Packets'}
    $Trace{Transport}{ICMP}{'DF Bytes'}
        Number of ICMP packets and bytes, respectively, with the DF bit set.

  Taking advantage of %Trace
    The following example creates the trace summary file only if the TCP
    traffic in terms of bytes accounts for more than 90% of the total IP
    traffic in the trace.

     # Explicitly import process_trace(), write_trace_summary(), and
     # get_trace_summary_href():

     use Net::Traces::TSH qw( process_trace write_trace_summary
                              get_trace_summary_href
                            );

     # Process a trace file...
     #
     process_trace "some.tsh";

     # Get a reference to %Trace
     #
     my $ts_href = get_trace_summary_href;

     # ...and create a summary only if the condition is met.
     #
     write_trace_summary
        if ( ( $ts_href->{Transport}{TCP}{'Total Bytes'}
               / $ts_href->{IP}{'Total Bytes'}
             ) > 0.9);

FUNCTIONS
    "Net::Traces::TSH" does not export any functions by default. The
    following functions, listed in alphabetical order, are exportable.

  date_of
      date_of FILENAME

    TSH traces downloaded from the NLANR PMA trace repository typically
    contain a timestamp as part of their filename. date_of() converts the
    timestamp to a human readable format. That is, if FILENAME contains a
    valid timestamp, date_of() returns the corresponding GMT date as a human
    readable string. For example,

     date_of 'ODU-1073132115.tsh'

    returns "Sat Jan 3 12:15:15 2004 GMT".

    If the FILENAME does not contain a timestamp date_of() returns *false*.

  get_IP_address
     get_IP_address INTEGER

    Converts a 32-bit integer to an IP address in dotted decimal notation.
    For example,

     get_IP_address(167772172)

    returns 10.0.0.12.

  get_trace_summary_href
     get_trace_summary_href

    Returns a hash *reference* to %Trace.

  process_trace
     process_trace FILENAME
     process_trace FILENAME, NUMBER
     process_trace FILENAME, NUMBER, TEXT_FILENAME

    If called in a void context process_trace() examines the binary TSH
    trace stored in FILENAME, and populates %Trace.

    NUMBER specifies the capacity of the monitored link in bits per second
    (b/s). If not specified, it defaults to 155,520,000.

    If called in a list context process_trace() gathers the same statistics
    and in addition it extracts all TCP flows and TCP data-carrying segments
    from the trace, returning two hash references. For example

     my ($senders_href, $segments_href) = process_trace 'trace.tsh';

    will process "trace.tsh" and return two hash references: *$senders_href*
    and *$segments_href*.

    *$senders_href* is a reference to a hash which contains an entry for
    each TCP sender in the trace file. A TCP sender is identified by the
    ordered 4-tuple

     (src, src port, dst, dst port)

    where *src* and *dst* are the 32-bit integers corresponding to the IP
    addresses of the sending and receiving hosts, respectively. Similarly,
    *src port* and *dst port* are the sending and receiving processes' port
    numbers. Senders are categorized on a per interface basis. For example,
    the following accesses the list of segments sent from 10.0.0.12:80 to
    10.0.0.14:1080 (on interface 1):

     $senders_href->{1}{167772172,80,167772174,1080}

    Each hash entry is a list of timestamps extracted from the trace records
    and stored after being "normalized" (start of trace = 0.0 seconds,
    always).

    In theory, records corresponding to packets transmitted on the same
    interface should have different timestamps. In practice, although it is
    not very likely that two data segments have the same timestamp, I
    encountered a few traces that did have duplicate timestamps.
    process_trace() checks for such cases and implements a timestamp
    "collision avoidance" algorithm. A timestamp collision threshold is
    defined and is currently set to 3. Trace processing is aborted if the
    number of records with the same timestamp exceeds this threshold. If you
    encounter such traces, it is not a bad idea to investigate why this is
    happening. It may be that the trace is corrupted.

    The second returned value, *$segments_href*, is another hash reference,
    which can be used to access any individual *data-carrying TCP segment*
    in the trace. Again, segments are categorized on a per interface basis.
    Three values are stored per segment: the total number of bytes
    (including IP and TCP headers, and application payload), the segment
    sequence number, and whether the segment was retransmitted or not.

    For example, assuming the the first record corresponds to a TCP segment,
    here is how you can print its packet size and the sequence number
    carried in the TCP header:

     my $interface = 1;
     my $timestamp = 0.0;

     print $segments_href->{$interface}{$timestamp}{bytes};
     print $segments_href->{$interface}{$timestamp}{seq_num};

    You can also check whether a segment was retransmitted or not:

     if ( segments_href->{$interface}{$timestamp}{retransmitted} ) {
       print "Segment was retransmitted by the TCP sender.";
     }
     else {
       print "Segment must have been acknowledged by the TCP receiver.";
     }

    Note that process_trace() only initializes the "retransmitted" value to
    false (0). It is write_sojourn_times() that detects retransmitted
    segments and updates the "retransmitted" entry to *true*, if it is
    determined that the segment was retransmitted.

    CAVEAT: write_sojourn_times() is not currently included in the stable,
    CPAN version of the module. Contact me if you want to to get the most
    recent version.

    If TEXT_FILENAME is specified, process_trace() generates a text file
    based on the trace records in a format similar to the modified output of
    tcpdump, as presented in *TCP/IP Illustrated Volume 1* by W. R. Stevens
    (see pp. 230-231).

    You can use such an output as input to other tools, present real traffic
    scenarios in a classroom, or simply "eyeball" the trace. For example,
    here are the first ten lines of the contents of such a file:

     0.000000000 10.0.0.1.6699 > 10.0.0.2.55309: . ack 225051666 win 65463
     0.000014000 10.0.0.3.80 > 10.0.0.4.14401: S 457330477:457330477(0) ack 810547499 win 34932
     0.000014000 10.0.0.1.6699 > 10.0.0.2.55309: . 3069529864:3069531324(1460) ack 225051666 win 65463
     0.000024000 10.0.0.5.12119 > 10.0.0.6.80: F 2073668891:2073668891(0) ack 183269290 win 64240
     0.000034000 10.0.0.7.4725 > 10.0.0.8.445: S 3152140131:3152140131(0) win 16384
     0.000067000 10.0.0.1.6699 > 10.0.0.2.55309: P 3069531324:3069531944(620) ack 225051666 win 65463
     0.000072000 10.0.0.11.3381 > 10.0.0.12.445: S 1378088462:1378088462(0) win 16384
     0.000083000 10.0.0.13.1653 > 10.0.0.1.6699: P 3272208349:3272208357(8) ack 501563814 win 32767
     0.000093000 10.0.0.14.1320 > 10.0.0.15.445: S 3127123478:3127123478(0) win 64170
     0.000095000 10.0.0.4.14401 > 10.0.0.3.80: R 810547499:810547499(0) ack 457330478 win 34932

    Note that the text output is similar to what tcpdump with options "-n"
    and "-S" would have produced. The only missing fields are related to the
    TCP options negotiated during connection setup. Unfortunately, TSH
    records include only the first 16 bytes of the TCP header, making it
    impossible to record the options from the segment header.

  records_in
     records_in FILENAME

    Estimates the number to records in FILENAME based on its file size and
    returns the "expected" number of records in the trace, which must an
    integer. If not an integer, records_in() returns *false*.

  verbose
     verbose

    As you might expect, this function sets the verbosity level of the
    module. By default "Net::Traces::TSH" remains "silent". Call verbose()
    to see trace processing progress indicators on standard error.

  write_trace_summary
     write_trace_summary FILENAME
     write_trace_summary

    Writes the contents of %Trace to FILENAME in comma separated values
    (CSV) format, a platform independent text format, excellent for storing
    tabular data. CSV is both human-readable and suitable for further
    analysis using Perl or direct import to a spreadsheet application.
    Although not required, it is recommended that FILENAME should have a
    *.csv* suffix.

    If FILENAME is not specified, write_trace_summary() will create one for
    you by appending the suffix *.csv* to the filename of the trace being
    processed.

    If you want FILENAME to contain meaningful data you should call
    write_trace_summary() *after* calling process_trace().

DEPENDENCIES
    Nothing non-standard: strict, warnings and Carp.

EXPORTS
    None by default.

  Exportable
    date_of() get_IP_address() get_trace_summary_href() numerically()
    process_trace() records_in() verbose() write_trace_summary()

    In addition, the following export tags are defined:

    :traffic_analysis
        verbose() process_trace() write_trace_summary()

    :trace_information
        date_of() records_in()

    Finally, all exportable functions can be imported with

     use Net::Traces::TSH qw(:all);

VERSION
    This is "Net::Traces::TSH" version 0.09.

SEE ALSO
    The NLANR MOAT Passive Measurement and Analysis (PMA) web site at
    http://pma.nlanr.net/PMA provides more details on the process of
    collecting packet traces. The site features a set of Perl programs you
    can download, including several converters from other packet trace
    formats to TSH.

    TSH trace files can be downloaded from the NLANR/PMA trace repository at
    http://pma.nlanr.net/Traces . The site contains a variety of traces
    gathered from several monitoring points at university campuses and
    (Giga)PoPs connected to a variety of large and small networks.

  DiffServ
    If you are not familiar with Differentiated Services (DiffServ), good
    starting points are the following RFCs:

    K. Nichols *et al.*, *Definition of the Differentiated Services Field
    (DS Field) in the IPv4 and IPv6 Headers*, RFC 2474. Available at
    http://www.ietf.org/rfc/rfc2474.txt

    S. Blake *et al.*, *An Architecture for Differentiated Services*, RFC
    2475. Available at http://www.ietf.org/rfc/rfc2475.txt

    See also RFC 2597 and RFC 2598.

  ECN
    If you are not familiar Explicit Congestion Notification (ECN) make sure
    to read

    K. K. Ramakrishnan *et al.*, *The Addition of Explicit Congestion
    Notification (ECN) to IP*, RFC 3168. Available at
    http://www.ietf.org/rfc/rfc3168.txt

AUTHOR
    Kostas Pentikousis, kostas@cpan.org.

ACKNOWLEDGMENTS
    Professor Hussein Badr provided invaluable guidance while crafting the
    main algorithms of this module.

    Many thanks to Wall, Christiansen and Orwant for writing *Programming
    Perl 3/e*. It has been indispensable while developing this module.

COPYRIGHT AND LICENSE
    Copyright 2003, 2004 by Kostas Pentikousis. All Rights Reserved.

    This library is free software with ABSOLUTELY NO WARRANTY. You can
    redistribute it and/or modify it under the same terms as Perl itself.

