Linux ATM API - Draft    April 18, 1995
=====================

Werner Almesberger, EPFL, DI-LRC, werner.almesberger@lrc.di.epfl.ch


About this document
===================

This document defines APIs for ATM-related system services under Linux. The 
basic design idea is to extend the BSD UNIX socket interface to support 
additional functionality needed for ATM wherever possible.

Currently, only CBR and UBR are specified. ABR and VBR support are left for 
further study. Similarly, SVC unicast and multicast signaling are not 
specified yet. Also, only a datagram transport is defined.

Detailed descriptions of system calls and library functions are only given 
where their use for ATM differs from their use for the INET protocol 
family. See [Stevens90] for a general introduction to the BSD socket API.

ATM functionality covered by this API is described in [ATM93].


Connection handling
===================


Phases
------

A connection typically goes through the following phases:

  - connection preparation 
  - connection setup 
  - data exchange 
  - connection teardown 

During connection preparation, parameters are set and general local 
resources (e.g. socket descriptors) are allocated. Neither local nor remote 
networking resources are allocated during preparation. During connection 
setup, local networking resources (bandwidth, connection identifiers, 
buffers, etc.) are allocated. Resource allocation in the network may be 
handled by network management (PVCs) or it may be done as part of the 
connection setup (SVCs). In the data exchange, data is sent over a 
previously established connection. Finally, during connection teardown, 
communication is stopped and resources are deallocated.

Connection preparation, connection setup and data exchange have to be 
performed in this order. Connection teardown is different in that it can be 
initiated at any time. It may even overlap with other phases (e.g. data may 
continue to flow in one direction after a shutdown until the final close).


Connection preparation
----------------------

The connection preparation phase consists of the following parts:

  - socket creation 
  - connection descriptor initialization 
  - traffic parameter specification 

Socket creation and connection descriptor initialization must always be 
performed before traffic parameter specification.


Socket creation
- - - - - - - -

Sockets are created with the socket system call.

socket(int domain, int type, int protocol);

The domain indicates whether signaling shall be used for this connection 
(SVC, not yet supported), or not (PVC). The following domains are defined: 

  PF_ATMPVC  ATM PVC connection 
  PF_ATMSVC  ATM SVC connection 

Merging of both domains into a single PF_ATM domain is left for further 
study.

The type selects general transport layer protocol characteristics. Only the 
SOCK_DGRAM transport protocol type is available, specifying an unreliable 
datagram transport. Note that no sequence guarantees are given, although 
the use of ATM generally implies that sequence will be preserved.

With protocol, a specific protocol is selected. Currently, only the 
protocol ATM_AAL5 is defined. Support of additional protocols is for 
further study.


Connection descriptor allocation and initialization
- - - - - - - - - - - - - - - - - - - - - - - - - -

The connection descriptor is a data structure describing traffic parameters 
(e.g. offered data rate, SDU size), connection requirements (e.g. maximum 
cell delay variation), and address information.

Depending on the address family, one of the following data types is used 
for the connection descriptor: 

  - struct sockaddr_atmpvc for a PVC 
  - struct sockaddr_atmsvc for an SVC 

PVC connection descriptors contain the following fields: 

  sap_family  address family (AF_ATMPVC) 
  sap_addr  PVC address 
  sap_txtp  traffic parameters in the send direction 
  sap_rxtp  traffic parameters in the receive direction 

Their use is described in the following sections.

During initialization, all fields of the connection descriptor are simply 
set to zero with the memset C library function. (See [ANSIC].)


Traffic parameter specification
- - - - - - - - - - - - - - - -

Traffic parameters are added to the connection descriptor by simply 
assigning values to the corresponding fields.

Traffic parameters are encoded in a data structure of type struct 
atm_trafprm containing the following fields:

  class  traffic class 
  max_pcr  minumum peak cell rate (PCR), in cells per second 
  max_cdv  maximum cell delay variation (CDV), in cell slots 
  max_sdu  maximum service data unit (SDU) size, in bytes 

Further parameters (maximum cell loss probability, end-to-end delay, etc.) 
may be added in the future.

The traffic class indicates general traffic properties. The following 
traffic classes are defined: 

  ATM_CBR  constant bit rate (CBR) 
  ATM_UBR  unassigned bit rate (UBR) 

If no traffic class is specified (i.e. if the field is set to zero), UBR is 
used as the default.

The minimum peak cell rate (PCR) specifies the bandwidth that will be 
consumed by this connection. Note that a higher peak cell rate may be 
allocated on connection establishment to accommodate for hardware 
limitations. For sending, the traffic shapers are adjusted to never exceed 
this cell rate. For receiving, the receiver prepares to accept cells 
arriving at at least this rate. The minimum PCR normally is zero or a 
positive integer. The special value ATM_MAX_PCR indicates an unlimited cell 
rate (i.e. link speed).

The maximum cell delay variation (CDV) specifies the maximum number of 
"cell slots" a cell can arrive ahead of its due time, relative to the 
preceding cell of the same VCC. (This definition is based on the CDV 
algorithm described in annex A of [I371].) When sending, cells will never 
be emitted at a pace exceeding this guarantee. For receiving, the receiver 
prepares to accept cells arriving with at least the specified CDV. The 
maximum CDV normally is a positive integer. Setting it to zero indicates an 
unspecified CDV.

The maximum service data unit (SDU) size specifies the maximum amount of 
data the API user will attempt to send at a time. For receiving and for 
sending, buffers are dimensioned accordingly. The SDU size is a positive 
integer. If set to zero, the maximum SDU size supported by the respective 
protocol is assumed.

Note that compatibility of requested QOS with available resources is not 
checked during connection preparation. This is only done during connection 
setup. Further, since only certain parameters values may be supported, 
traffic parameters may be changed during connection setup. This is 
reflected by also altering the connection descriptor passed to the 
corresponding system calls.

A library function is provided to convert SDU rates to cell rates:

int sdu2cr(int s,int sizes,int *sdu_size,int *num_sdu);

s is the socket descriptor that has previously been returned by the socket 
system call. sizes is the number of SDU sizes described in the sdu_size and 
num_sdu arrays. Each element of sdu_size indicates the length (in bytes) of 
an SDU. Each element of num_sdu indicates the number of times an SDU with 
the corresponding size is sent in a given time interval.

sdu2cr computes the total number of ATM cells that would be sent on the 
connection and returns either that number or -1 if there is an overflow.


Example
- - - -

    struct sockaddr_atmpvc addr;
    int s;

    if ((s = socket(PF_ATMPVC,SOCK_DGRAM,ATM_AAL5)) < 0)
        perror("socket");
        exit(1);
    }
    memset(&addr,0,sizeof(addr));
    addr.sap_family = AF_ATMPVC;
    addr.sap_txtp.class = ATM_UBR;
    addr.sap_txtp.max_pcr = ATM_MAX_PCR;
    addr.sap_txtp.max_sdu = 8192;
    addr.sap_rxtp = addr.sap_txtp;


Connection setup
----------------

Connection setup consists simply of adding address information to the 
connection descriptor and invoking connect or bind.


PVC addressing
- - - - - - -

A PVC is uniquely identified by the following sap_addr fields: 

  itf  interface number 
  vpi  virtual path identifier (VPI) 
  vpi  virtual channel identifier (VCI) 

The valid ranges for VCIs and VPIs depend on the interface and might be 
configurable (see section "Interface control"). The following special 
values are recognized:

  itf == ATM_ITF_ANY  selects the lowest-numbered interface on which the 
    specified VPI/VCI pair exists and is available. 
  vpi == ATM_VPI_ANY  selects the lowest-numbered free VPI on the specified 
    interface on which the specified VCI is available. 
  vci == ATM_VCI_ANY  selects the lowest-numbered free VCI on the specified 
    interface for which the VPIs correspond. 
  vpi == ATM_VPI_UNSPEC  does not allocate any VPI/VCI pair and uses the 
    interface number for global resource control only. vpi == 
    ATM_VPI_UNSPEC also implies that the VCI is unspecified. Therefore, the 
    VCI value is ignored. 
  vci == ATM_VCI_UNSPEC  does not allocate any VPI/VCI pair and uses the 
    interface and VPI numbers for global resource control only. 

Addresses containing ATM_part_ANY components are called wildcard addresses. 
Similarly, addresses containing ATM_part_UNSPEC components are called 
incompletely specified addresses.

An address may contain more than one wildcard component, and wildcards and 
incompletely specified components can be mixed. E.g. itf == ATM_ITF_ANY, 
vpi == ATM_VPI_ANY and vci == ATM_VCI_ANY would select the lowest-numbered 
free PVC address. When using more than one wildcard component in an 
address, interface numbers are more significant than VPI numbers, and VPI 
numbers are more significant than VCI numbers when determining the lowest 
address.


System calls
- - - - - -

The connect and bind system calls are used to set up connections. The 
behaviour of both is identical for PVCs. The following peculiarities have 
to be noted:

  - some connection parameters may be modified, see section "Traffic 
    parameter specification". It is not specified what modifications (if 
    any) have taken place when bind or connect return an error. 
  - a connection to an unspecified address is not yet ready to transport 
    data; it must be connected or bound a second time without any 
    unspecified components. On this second call, it is an error if either 
    any address components except for previously unspecified ones are 
    changed, or if there are still unspecified components left. 

The following values of errno have a special meaning when using connect or 
bind on ATM sockets:

  EADDRNOTAVAIL  the requested address cannot be assigned on the existing 
    interface(s) in the present configuration. 
  EOPNOTSUPP  on the second call, either address parts already specified in 
    the first call have been changed or there are still unspecified address 
    parts. 
  ENETDOWN  the specified interface exists but is currently not 
    operational. 
  ENETUNREACH  the requested QOS criteria could not be met. 


Example
- - - -

    addr.sap_addr.itf = 0;
    addr.sap_addr.vpi = ATM_VPI_UNSPEC;
    if (connect(s,(struct sockaddr *) &addr,sizeof(addr)) < 0) {
        perror("connect(1)");
        exit(1);
    }
    /* some other activities */
    addr.sap_addr.vpi = 0;
    addr.sap_addr.vci = 42;
    if (connect(s,(struct sockaddr *) &addr,sizeof(addr)) < 0) {
        perror("connect(2)");
        exit(1);
    }


Data exchange
-------------

The data exchange paradigm for ATM sockets is modeled as close as possible 
after the one for BSD sockets. The only significant exception are 
additional buffer alignment and size constraints which apply when 
optimizing for throughput.


Transfer scheduling
- - - - - - - - - -

The select system call can be used to schedule receive and send operations 
in order to minimize blocking delays.


Sending and receiving
- - - - - - - - - - -

The system calls read, readv, recv, recvmsg, send, sendmsg, write, and 
writev are supported with their usual semantics. (See [POSIX] and 
[Stevens90].) Note that recvfrom and sendto are currently not supported.


Alignment and size constraints
- - - - - - - - - - - - - - -

In order to optimize throughput, specific buffer alignment and size 
considerations may be necessary. This information can be used to adapt the 
send and receive procedures.

Buffer constraints can be obtained with the getsockopt system call:

int getsockopt(int s,int level,int optname,void *optval,int *optlen);

The level is SOL_SOCKET, the following values for optname are recognized 
(each parameter exists in the send and in the receive direction): 

  SO_BCTXOPT  constraints for sending data with best throughput 
  SO_BCRXOPT  constraints for receiving data with best throughput 

optval is a pointer to a data structure of type struct atm_buffconst with 
the following fields:

  buf_fac  buffer alignment factor 
  buf_off  buffer alignment offset 
  size_fac  size alignment factor 
  size_off  size alignment offset 
  min_size  minimum size 
  max_size  maximum size 

A maximum size only limited by the general system architecture or by quotas 
is coded as zero. The following relations must be true:

buffer alignment offset < buffer alignment factor               
size alignment offset < size alignment factor                   
minimum size < maximum size                                     
size alignment factor <= maximum size - minimum size            
minimum size = size alignment offset + size alignment factor *N 
maximum size = size alignment offset + size alignment factor *N*
 

  *  Unless the maximum size is coded as zero

where N=0,1,2,....


Asynchronous I/O
- - - - - - - -

Support for asynchronous I/O is left for further study.


Example
- - - -

    const char msg[] = "Hello, world !\n";
    char *buffer,*start;
    struct atm_buffconst bc;
    caddr_t pos;
    int length,buf_len,size;

    length = sizeof(bc);
    if (getsockopt(s,SOL_SOCKET,SO_BCTXOPT,&bc,&length) < 0) {
        perror("getsockopt");
        exit(1);
    }
    buf_len = sizeof(msg)-bc.size_off+bc.size_fac-1;
    buf_len = buff_len-(buf_len % bc.size_fac)+bc.size_off;
    if (buf_len < bc.min_size) buf_len = bc.min_size;
    if (!(buffer = malloc(buf_len+bc.size_fac-1))) {
        perror("malloc");
        exit(1);
    }
    pos = (caddr_t) (buffer-bc.buf_off+bc.buf_fac-1);
    start = (char *) (pos-(pos % bc.buf_fac)+bc.buf_off);
    if (sizeof(msg) != buf_len)
        memset(start+sizeof(msg),0,buf_len-sizeof(msg));
    bc.buf_fac
    if ((size = write(s,start,buf_len)) < 0) {
        perror("write");
        exit(1);
    }
    if (size != buf_len)
        fprintf(stderr,"Wrote only %d of %d bytes\n",size,
          sizeof(msg));


Connection teardown
-------------------

Two steps are distinguished when connections are torn down:

  - connection shutdown 
  - closing 

Connection shutdown stops data transmission in either or both directions. 
Resources associated with a connection are deallocated when closing the 
connection. A connection shutdown is performed implicitly when closing a 
connection.


Connection shutdown
- - - - - - - - - -

When a connection is shut down for sending, no further data is accepted for 
sending. Note that it is still possible to receive data unless the 
connection has also been shut down for receiving.

When a connection is shut down for receiving, no further data is accepted 
from the network. Note that it is still possible to send data unless the 
connection has also been shut down for sending.

The shutdown system call is used to shut down connections.


Closing
- - - -

When a connection has been closed, no further access to it is possible and 
all resources associated with it are freed.

Connections are closed with the close system call.


Example
- - - -

    (void) close(s);


Summary
-------

All steps in connection handling are summarized below along with the 
principal system calls or library functions used in each step.

  - connection preparation 

      - socket creation (socket) 
      - connection descriptor initialization (memset) 
      - traffic parameter specification (assignments) 

  - connection setup 

      - addressing (assignments) 
      - partial setup (bind, connect) 
      - address completion (assignments) 
      - full setup (bind, connect) 

  - data exchange 

      - transfer scheduling (select) 
      - sending and receiving (read, write, ...) 
      - alignment and size constraints (getsockopt) 

  - connection teardown 

      - connection shutdown (shutdown) 
      - closing (close) 


Interface control
=================

Certain interface parameters can be modified in some implementations. In 
addition, statistics of important system events can be queried.


ATM layer
---------


Connection identifier ranges
- - - - - - - - - - - - - -


Physical layer
--------------

For further study.


Related services
================

This section describes support of services not defined in [ATM93], which 
are commonly associated with ATM.


IP over ATM
-----------

API details for IP over ATM support (as specified in [RFC1483], [RFC1577], 
and [RFC1626]) are for further study.

The current implementation indicates use for IP over ATM by specifying the 
protocol ATM_CLIP (CLassical IP) in the socket system call.


Acronyms
========

This appendix lists all acronyms that have been used in the document:

  ABR  Available Bit Rate 
  API  Application Program Interface 
  ATM  Asynchronous Transfer Mode 
  BSD  Berkeley Software Distribution 
  CBR  Constant Bit Rate 
  CDV  Cell Delay Variation 
  PCR  Peak Cell Rate 
  PVC  Permanent Virtual Circuit, see [I233] 
  QOS  Quality Of Service 
  SDU  Service Data Unit 
  SVC  Switched Virtual Circuit 
  UBR  Unassigned Bit Rate 
  VBR  Variable Bit Rate 
  VCC  Virtual Channel Connection 
  VCI  Virtual Channel Identifier 
  VPI  Virtual Path Identifier 


References
==========

  ATM93  The ATM Forum. ATM User-Network Interface Specification, Prentice 
    Hall, 1993. 
  ANSIC  ANSI/ISO 9899-1990, Herbert Schildt. The Annotated ANSI C 
    Standard, Osborne McGraw-Hill, 1990. 
  I233  ITU-T Recommendation I.233. Frame mode bearer services, ISDN frame 
    relaying bearer service and ISDN frame switching bearer service, ITU, 
    1992. 
  I371  ITU-T Recommendation I.371. Traffic control and congestion control 
    in B-ISDN, ITU, 1993. 
  POSIX  IEEE Standard for Information Technology. Portable Operating 
    System Interface (POSIX). Part 1: System Application Program Interface 
    (API), IEEE, 1994. 
  RFC1483  Heinanen, Juha. Multiprotocol Encapsulation over ATM Adaptation 
    Layer 5, IETF, 1993. 
  RFC1577  Laubach, Mark. Classical IP and ARP over ATM, IETF, 1994. 
  RFC1626  Atkinson, Randall J. Default IP MTU for use over ATM AAL5, IETF, 
    1994. 
  Stevens90  Stevens, W. Richard. UNIX Network Programming, Prentice-Hall, 
    1990. 
