Linux ATM API - Draft    June 20, 1995
=====================

Werner Almesberger, EPFL, LRC, werner.almesberger@lrc.di.epfl.ch


About this document
===================

This document defines an API for ATM-related system services under Linux. 
The basic design idea is to extend the 4.3 BSD release UNIX socket 
interface to support additional functionality needed for ATM wherever 
possible.

Currently, only CBR and UBR are specified. ABR and VBR support are left for 
further study. Similarly, SVC unicast and multicast signaling are not 
specified yet. Also, only a datagram transport is defined.

Detailed descriptions of system calls and library functions are only given 
where their use for ATM differs from their use for the INET protocol 
family. See [Stevens] for a general introduction to the BSD socket API.

ATM functionality covered by this API is described in [ATM93].

Previous and future versions of this document can be found at 
http://lrcwww.epfl.ch/linux-atm/


Related APIs
------------

This API is not directly aligned with the "Native ATM Services" API 
specification [ATM95] currently being prepared by the SAA API ad-hoc work 
group of ATM Forum, but evolution of their work is being monitored and any 
development requiring significant changes to the API will be reflected in 
this proposal. The SAA API work group currently doesn't intend to specify 
an ATM API for the BSD-style socket interface.

The IETF currently faces similar compatibility issues in preparation of the 
transition from IPv4 to IPv6. The current Internet Draft describing IPv6 
extensions for BSD sockets [IETF-IPng] therefore provides useful insights 
into problems which have to be addressed in the context of ATM too.


Design considerations
---------------------

The main considerations driving the design of this API are the following: 

  - when porting applications written for INET domain sockets, only source 
    code parts using addresses should have to be modified 
  - where existing semantics are modified, the changes should follow the 
    original design philosophy and also the original terminology 
  - where semantics are modified or new ones are are added, work from 
    related standardization efforts is adapted if possible 

Also, care is being taken to make sure to follow up design decisions with 
implementation experience as closely as possible.


Changes since first draft (version 0.1)
---------------------------------------

The following items have been added: 

  - a few introductory remarks 
  - indicated that PCR and CDV are ignored when UBR is chosen 
  - QOS guarantee only applies to network 
  - description of role and references to related APIs 
  - clarified that connect on incompletely specified address allocates 
    resources 
  - all values are stored in host byte order 
  - ATM_NONE traffic class for unidirectional traffic 

The following items have been changed: 

  - clarified use of address parameter in sendto, recvfrom, sendmsg, and 
    recvmsg 
  - max_pcr has been changed to min_pcr 
  - maximum CDV explanation was ambiguous 
  - explained use of PVC/SVC and removed reference to ITU-T I.233 
  - changed "VCC" to "VC" 
  - several bugs in the program examples 


Connection handling
===================

In ATM, the fundamental communication paradigm is the connection. This 
section describes the mechanisms to establish, use and release ATM 
connections.


Phases
------

A connection typically goes through the following phases:

  - connection preparation 
  - connection setup 
  - data exchange 
  - connection teardown 

During connection preparation, parameters are set and general local 
resources (e.g. socket descriptors) are allocated. Neither local nor remote 
networking resources are allocated during preparation. During connection 
setup, local networking resources (bandwidth, connection identifiers, 
buffers, etc.) are allocated. Resource allocation in the network may be 
handled by network management (PVCs) or it may be done as part of the 
connection setup (SVCs). In the data exchange, data is sent over a 
previously established connection. Finally, during connection teardown, 
communication is stopped and resources are deallocated.

Connection preparation, connection setup and data exchange have to be 
performed in this order. Connection teardown is different in that it can be 
initiated at any time. It may even overlap with other phases (e.g. data may 
continue to flow in one direction after a shutdown until the final close).


Connection preparation
----------------------

The connection preparation phase consists of the following parts:

  - socket creation 
  - connection descriptor initialization 
  - traffic parameter specification 

Socket creation and connection descriptor initialization must always be 
performed before traffic parameter specification.


Socket creation
- - - - - - - -

Sockets are created with the socket system call.

socket(int domain, int type, int protocol);

The domain indicates whether signaling shall be used for this connection 
(SVC, not yet supported), or not (PVC). The following domains are defined: 

  PF_ATMPVC  ATM PVC connection 
  PF_ATMSVC  ATM SVC connection 

Merging of both domains into a single PF_ATM domain is left for further 
study.

The type selects general transport layer protocol characteristics. Only the 
SOCK_DGRAM transport protocol type is currently available, specifying an 
unreliable datagram transport. Note that no sequence guarantees are given, 
although the use of ATM generally implies that sequence will be preserved.

With protocol, a specific protocol is selected. Currently, only the 
protocol ATM_AAL5 is defined. Support of additional protocols is for 
further study.


Connection descriptor allocation and initialization
- - - - - - - - - - - - - - - - - - - - - - - - - -

The connection descriptor is a data structure describing traffic parameters 
(e.g. offered data rate, SDU size), connection requirements (e.g. maximum 
cell delay variation), and address information.

Depending on the address family, one of the following data types is used 
for the connection descriptor: 

  - struct sockaddr_atmpvc for a PVC 
  - struct sockaddr_atmsvc for an SVC 

PVC connection descriptors contain the following fields: 

  sap_family  address family (AF_ATMPVC) 
  sap_addr  PVC address 
  sap_txtp  traffic parameters in the send direction 
  sap_rxtp  traffic parameters in the receive direction 

Their use is described in the following sections. Note that the socket 
structure as described here conforms to the definition of socked address 
structures of the 4.3 BSD release, which has also been adopted by Linux.*

  *  4.4 BSD introduced an additional length field, which is not supported 
    by Linux.

During initialization, all fields of the connection descriptor are simply 
set to zero with the memset C library function. (See [ANSIC].)


Traffic parameter specification
- - - - - - - - - - - - - - - -

Traffic parameters are added to the connection descriptor by simply 
assigning values to the corresponding fields.

Traffic parameters are encoded in a data structure of type struct 
atm_trafprm containing the following fields:

  class  traffic class 
  min_pcr  minimum peak cell rate (PCR), in cells per second 
  max_cdv  maximum cell delay variation (CDV), in cell slots 
  max_sdu  maximum service data unit (SDU) size, in bytes 

Further parameters (variable bit rate, maximum cell loss probability, 
end-to-end delay, etc.) may be added in the future. All parameter values 
are stored in host byte order.

The traffic class indicates general traffic properties. The following 
traffic classes are defined: 

  ATM_NONE  no traffic in this direction 
  ATM_CBR  constant bit rate (CBR) 
  ATM_UBR  unassigned bit rate (UBR) 

If no traffic class is specified (i.e. if the field is set to zero), 
ATM_NONE is used as the default. When the traffic class is ATM_NONE, all 
other traffic parameters are ignored.

The minimum peak cell rate (PCR) specifies the bandwidth that will be 
consumed by this connection. Note that a higher peak cell rate may be 
allocated on connection establishment to accommodate for hardware 
limitations. For sending, the traffic shapers are adjusted to never exceed 
this cell rate. For receiving, the receiver prepares to accept cells 
arriving at at least this rate. The minimum PCR normally is zero or a 
positive integer. The special value ATM_MAX_PCR indicates an unlimited cell 
rate (i.e. link speed). min_pcr is ignored when using UBR.

The maximum cell delay variation (CDV) specifies the upper bound for the 
maximum number of "cell slots" a cell can arrive ahead of its due time, 
relative to the preceding cell of the same VC,* i.e. the operating system 
may select a lower CDV than requested in max_cdv. When sending, cells will 
never be emitted at a pace exceeding this guarantee. For receiving, the 
receiver prepares to accept cells arriving with at least the specified CDV. 
The maximum CDV normally is a positive integer. Setting it to zero 
indicates an unspecified CDV. max_cdv is ignored when using UBR.

  *  This definition of CDV is based on the "peak cell rate monitor 
    algorithms accounting for cell delay variation tolerance" described in 
    annex A of [I371].

The maximum service data unit (SDU) size specifies the maximum amount of 
data the API user will attempt to send at a time. For receiving and for 
sending, buffers are dimensioned accordingly. The SDU size is a positive 
integer. If set to zero, the maximum SDU size supported by the respective 
protocol is assumed.

Note that compatibility of requested QOS with available resources is not 
checked during connection preparation. This is only done during connection 
setup. Further, since only certain parameter values may be supported, 
traffic parameters may be changed during connection setup. This is 
reflected by also altering the connection descriptor passed to the 
corresponding system calls.

A library function is provided to convert SDU rates to cell rates:

int sdu2cr(int s,int sizes,int *sdu_size,int *num_sdu);

s is the socket descriptor that has previously been returned by the socket 
system call. sizes is the number of SDU sizes described in the sdu_size and 
num_sdu arrays. Each element of sdu_size indicates the length (in bytes) of 
an SDU. Each element of num_sdu indicates the number of times an SDU with 
the corresponding size is sent in a given time interval.

sdu2cr computes the total number of ATM cells that would be sent on the 
connection and returns either that number or -1 if there is an overflow.

Note that service guarantees (e.g. timely processing of CBR traffic) only 
apply to the network. It is the application's responsibility to ensure that 
sufficient host resources are available to properly generate or accept 
traffic streams.


Example
- - - -

    struct sockaddr_atmpvc addr;
    int s;

    if ((s = socket(PF_ATMPVC,SOCK_DGRAM,ATM_AAL5)) < 0) {
        perror("socket");
        exit(1);
    }
    memset(&addr,0,sizeof(addr));
    addr.sap_family = AF_ATMPVC;
    addr.sap_txtp.class = ATM_UBR;
    addr.sap_txtp.min_pcr = ATM_MAX_PCR;
    addr.sap_txtp.max_sdu = 8192;
    addr.sap_rxtp = addr.sap_txtp;


Connection setup
----------------

Connection setup consists simply of adding address information to the 
connection descriptor and invoking connect or bind.


PVC addressing
- - - - - - -

A PVC is uniquely identified by the following sap_addr fields: 

  itf  interface number 
  vpi  virtual path identifier (VPI) 
  vci  virtual channel identifier (VCI) 

The contents of all fields are stored in host byte order. The valid ranges 
for VCIs and VPIs depend on the interface and might be configurable (see 
section "Interface control"). The following special values are recognized:

  itf == ATM_ITF_ANY  selects the lowest-numbered interface on which the 
    specified VPI/VCI pair exists and is available. 
  vpi == ATM_VPI_ANY  selects the lowest-numbered free VPI on the specified 
    interface on which the specified VCI is available. 
  vci == ATM_VCI_ANY  selects the lowest-numbered free VCI on the specified 
    interface for which the VPIs correspond. 
  vpi == ATM_VPI_UNSPEC  does not allocate any VPI/VCI pair and uses the 
    interface number for global resource control only. vpi == 
    ATM_VPI_UNSPEC also implies that the VCI is unspecified. Therefore, the 
    VCI value is ignored. 
  vci == ATM_VCI_UNSPEC  does not allocate any VPI/VCI pair and uses the 
    interface and VPI numbers for global resource control only. 

Addresses containing ATM_part_ANY components are called wildcard addresses. 
Similarly, addresses containing ATM_part_UNSPEC components are called 
incompletely specified addresses.

An address may contain more than one wildcard component, e.g. itf == 
ATM_ITF_ANY, vpi == ATM_VPI_ANY and vci == ATM_VCI_ANY would select the 
lowest-numbered free PVC address. When using more than one wildcard 
component in an address, interface numbers are more significant than VPI 
numbers, and VPI numbers are more significant than VCI numbers when 
determining the lowest address. Furthermore, wildcards and incompletely 
specified components can be mixed. In this case, the wildcard selection is 
based on incomplete information.

Two connections can share the same address if one uses only the receive 
direction (e.g. addr.txtp.class == ATM_NONE) and the other uses only the 
transmit direction. Note that such sharing may not be supported for all 
traffic classes.


System calls
- - - - - -

The connect and bind system calls are used to set up connections. The 
behaviour of both is identical for PVCs. The following peculiarities have 
to be noted:

  - some connection parameters may be modified, see section "Traffic 
    parameter specification". It is not specified what modifications (if 
    any) have taken place when bind or connect return an error. 
  - a connection to an incompletely specified address is not yet ready to 
    transport data after calling connect for the first time; it must be 
    connected or bound a second time without any unspecified components. On 
    this second call, it is an error if either any address components 
    except for previously unspecified ones are changed, or if there are 
    still unspecified components left. The first call to connect reserves 
    all resources that have been explicitly specified. If no second connect 
    is to be attempted, the resources have to be released by calling close. 

The following values of errno have a special meaning when using connect or 
bind on ATM sockets:

  EADDRNOTAVAIL  the requested address cannot be assigned on the existing 
    interface(s) in the present configuration. 
  EOPNOTSUPP  on the second call, either address parts already specified in 
    the first call have been changed or there are still unspecified address 
    parts. 
  ENETDOWN  the specified interface exists but is currently not 
    operational. 
  ENETUNREACH  the requested QOS criteria could not be met. 


Example
- - - -

    addr.sap_addr.itf = 0;
    addr.sap_addr.vpi = ATM_VPI_UNSPEC;
    if (connect(s,(struct sockaddr *) &addr,sizeof(addr)) < 0) {
        perror("connect(1)");
        exit(1);
    }
    /* some other activities */
    addr.sap_addr.vpi = 0;
    addr.sap_addr.vci = 42;
    if (connect(s,(struct sockaddr *) &addr,sizeof(addr)) < 0) {
        perror("connect(2)");
        exit(1);
    }


Data exchange
-------------

The data exchange paradigm for ATM sockets is modeled as close as possible 
after the one for BSD sockets. The only significant exceptions are 
additional buffer alignment, addresses, and size constraints which apply 
when optimizing for throughput.


Transfer scheduling
- - - - - - - - - -

The select system call can be used to schedule receive and send operations 
in order to minimize blocking delays.


Sending and receiving
- - - - - - - - - - -

The system calls read, readv, recv, recvfrom, recvmsg, send, sendto, 
sendmsg, write, and writev are supported with their usual semantics. (See 
[POSIX] and [Stevens].) Note that, however, the sockets have to be 
connected and that the arguments of sendto, recvfrom, sendmsg, and recvmsg 
specifying a source or destination address must be NULL.


Alignment and size constraints
- - - - - - - - - - - - - - -

In order to optimize throughput, specific buffer alignment and size 
considerations may be necessary. This information can be used to adapt the 
send and receive procedures.

Buffer constraints can be obtained with the getsockopt system call:

int getsockopt(int s,int level,int optname,void *optval,int *optlen);

The level is SOL_SOCKET, the following values for optname are recognized 
(each parameter exists in the send and in the receive direction): 

  SO_BCTXOPT  constraints for sending data with best throughput 
  SO_BCRXOPT  constraints for receiving data with best throughput 

optval is a pointer to a data structure of type struct atm_buffconst with 
the following fields:

  buf_fac  buffer alignment factor 
  buf_off  buffer alignment offset 
  size_fac  size alignment factor 
  size_off  size alignment offset 
  min_size  minimum size 
  max_size  maximum size 

The contents of all fields are stored in host byte order. A maximum size 
only limited by the general system architecture or by quotas is coded as 
zero. The following relations are true:

buffer alignment offset < buffer alignment factor                
size alignment offset < size alignment factor                    
minimum size <= maximum size*                                    
size alignment factor <= maximum size - minimum size             
minimum size = size alignment offset + size alignment factor *N  
maximum size = size alignment offset + size alignment factor *N**
 

  *  Unless the maximum size is coded as zero, in which case the minimum 
    size can have any value.

  **  Unless the maximum size is coded as zero, in which case the size 
    alignment is not constrained by the maximum size.

where N=0,1,2,....


Asynchronous I/O
- - - - - - - -

Support for asynchronous I/O is left for further study.


Example
- - - -

    const char msg[] = "Hello, world !\n";
    char *buffer,*start;
    struct atm_buffconst bc;
    ptrdiff_t pos;
    size_t length,buf_len;
    ssize_t size;

    length = sizeof(bc);
    if (getsockopt(s,SOL_SOCKET,SO_BCTXOPT,(char *) &bc,&length) < 0) {
        perror("getsockopt");
        exit(1);
    }
    buf_len = sizeof(msg)-bc.size_off+bc.size_fac-1;
    buf_len = buf_len-(buf_len % bc.size_fac)+bc.size_off;
    if (buf_len < bc.min_size) buf_len = bc.min_size;
    if (!(buffer = malloc(buf_len+bc.size_fac-1))) {
        perror("malloc");
        exit(1);
    }
    pos = (ptrdiff_t) (buffer-bc.buf_off+bc.buf_fac-1);
    start = (char *) (pos-(pos % bc.buf_fac)+bc.buf_off);
    if (sizeof(msg) != buf_len)
        memset(start+sizeof(msg),0,buf_len-sizeof(msg));
    if ((size = write(s,start,buf_len)) < 0) {
        perror("write");
        exit(1);
    }
    if (size != buf_len)
        fprintf(stderr,"Wrote only %d of %d bytes\n",size,
          sizeof(msg));


Connection teardown
-------------------

Two steps are distinguished when connections are torn down:

  - connection shutdown 
  - closing 

Connection shutdown stops data transmission in either or both directions. 
Resources associated with a connection are deallocated when closing the 
connection. A connection shutdown is performed implicitly when closing a 
connection.


Connection shutdown
- - - - - - - - - -

When a connection is shut down for sending, no further data is accepted for 
sending. Note that it is still possible to receive data unless the 
connection has also been shut down for receiving.

When a connection is shut down for receiving, no further data is accepted 
from the network. Note that it is still possible to send data unless the 
connection has also been shut down for sending.

The shutdown system call is used to shut down connections.

Whether and how connection shutdown also implies releasing of resources 
allocated to the connection is defined by the implementation.


Closing
- - - -

When a connection has been closed, no further access to it is possible and 
all resources associated with it are freed.

Connections are closed with the close system call or by terminating the 
process owning the connection.*

  *  Note that duplication of ATM sockets is currently not specified by 
    this document. Therefore, only one reference can exist for each 
    connection.


Example
- - - -

    (void) close(s);


Summary
-------

All steps in connection handling are summarized below along with the 
principal system calls or library functions used in each step.

  - connection preparation 

      - socket creation (socket) 
      - connection descriptor initialization (memset) 
      - traffic parameter specification (assignments) 

  - connection setup 

      - addressing (assignments) 
      - partial setup (bind, connect) 
      - address completion (assignments) 
      - full setup (bind, connect) 

  - data exchange 

      - transfer scheduling (select) 
      - sending and receiving (read, write, ...) 
      - alignment and size constraints (getsockopt) 

  - connection teardown 

      - connection shutdown (shutdown) 
      - closing (close) 


Interface control
=================

Certain interface parameters can be modified in some implementations. In 
addition, statistics of important system events can be queried.


AAL layer
---------


ATM layer
---------


Connection identifier ranges
- - - - - - - - - - - - - -


Physical layer
--------------

For further study.


Related services
================

This section describes support of services not defined in [ATM93], which 
are commonly associated with ATM.


IP over ATM
-----------

API details for IP over ATM support (as specified in [RFC1483], [RFC1577], 
and [RFC1626]) are for further study.

The current implementation indicates use for IP over ATM by specifying the 
protocol ATM_CLIP (CLassical IP) in the socket system call.


Acronyms
========

This appendix lists all acronyms that have been used in the document:

  ABR  Available Bit Rate 
  API  Application Program Interface 
  ATM  Asynchronous Transfer Mode 
  BSD  Berkeley Software Distribution 
  CBR  Constant Bit Rate 
  CDV  Cell Delay Variation 
  PCR  Peak Cell Rate 
  PVC  Permanent Virtual Circuit (see below) 
  QOS  Quality Of Service 
  SDU  Service Data Unit 
  SVC  Switched Virtual Circuit (see below) 
  UBR  Unassigned Bit Rate 
  VBR  Variable Bit Rate 
  VC  Virtual Channel 
  VCI  Virtual Channel Identifier 
  VPI  Virtual Path Identifier 

Note that the terms "PVC" and "SVC" are not official ATM terminology as 
used by ITU. They originate from Frame Relay terminology and it is common 
practice by ATM Forum and other groups to use them to describe the 
corresponding concepts in ATM. Unfortunately, ITU-T has re-used the 
abbreviation "SVC" for ATM to mean "Signalling Virtual Channel", which is 
the VC used to carry signaling messages.


References
==========

  ATM93  The ATM Forum. ATM User-Network Interface Specification, Prentice 
    Hall, 1993. 
  ATM95  The ATM Forum, SAA API Ad-hoc Work Group. Native ATM Services: 
    Semantic Description, ATM Forum contribution 95-0008R3, 1995. 
  ANSIC  ANSI/ISO 9899-1990, Herbert Schildt. The Annotated ANSI C 
    Standard, Osborne McGraw-Hill, 1990. 
  IETF-IPng  Gilligan, Robert E.; Thomson, Susan; Bound, Jim. IPv6 Program 
    Interfaces for BSD Systems (work in progress), Internet Draft 
    draft-ietf-ipngwg-bsd-api-00.txt, 1995. 
  I371  ITU-T Recommendation I.371. Traffic control and congestion control 
    in B-ISDN, ITU, 1993. 
  LeBoudec  Le Boudec, Jean-Yves. The Asynchronous Transfer Mode: a 
    tutorial, Computer Networks and ISDN Systems, Volume 24, Number 4, 
    1992. 
  POSIX  IEEE Standard for Information Technology. Portable Operating 
    System Interface (POSIX). Part 1: System Application Program Interface 
    (API), IEEE, 1994. 
  RFC1483  Heinanen, Juha. Multiprotocol Encapsulation over ATM Adaptation 
    Layer 5, IETF, 1993. 
  RFC1577  Laubach, Mark. Classical IP and ARP over ATM, IETF, 1994. 
  RFC1626  Atkinson, Randall J. Default IP MTU for use over ATM AAL5, IETF, 
    1994. 
  Stevens  Stevens, W. Richard. UNIX Network Programming, Prentice-Hall, 
    1990. 
