


SGMLS(1)                                                 SGMLS(1)


NAME
       sgmls - a validating SGML parser

       An SGML System Conforming to
       International Standard ISO 8879 --
       Standard Generalized Markup Language

SYNOPSIS
       sgmls  [  -deglprsuv  ] [ -cfile ] [ -iname ] [ -mfile ] [
       filename...  ]

DESCRIPTION
       Sgmls parses and validates the  SGML  document  entity  in
       filename...   and  prints  on the standard output a simple
       ASCII representation of its Element Structure  Information
       Set.   (This  is  the  information  set which a structure-
       controlled conforming SGML application should  act  upon.)
       Note  that  the document entity may be spread amongst sev-
       eral files; for example, the  SGML  declaration,  document
       type  declaration  and document instance set could each be
       in a separate file.  If no filenames are  specified,  then
       sgmls  will  read  the  document  entity from the standard
       input.  A filename of - can also be used to refer  to  the
       standard input.

       The following options are available:

       -cfile Report  any  capacity  limits that are exceeded and
              write a report of  capacity  usage  to  file.   The
              report  is in the format of a RACT result.  RACT is
              the  Reference  Application  for  Capacity  Testing
              defined  in the Proposed American National Standard
              Conformance Testing for Standard Generalized Markup
              Language  (SGL)  Systems  (X3.190-199X), Draft July
              1991.

       -d     Warn about duplicate entity declarations.

       -e     Describe open entities in  error  messages.   Error
              messages  always  include  the position of the most
              recently opened external entity.

       -g     Show the GIs of open elements in error messages.

       -iname Pretend that

                     <!ENTITY % name "INCLUDE">

              occurs at the start of the document  type  declara-
              tion  subset  in  the  SGML document entity.  Since
              repeated definitions of an entity are ignored, this
              definition will take precedence over any other def-
              initions of this entity in the document type decla-
              ration.   Multiple  -i options are allowed.  If the



                                                                1





SGMLS(1)                                                 SGMLS(1)


              SGML declaration replaces the reserved name INCLUDE
              then  the new reserved name will be the replacement
              text of the entity.  Typically  the  document  type
              declaration will contain

                     <!ENTITY % name "IGNORE">

              and  will use %name; in the status keyword specifi-
              cation of a marked section  declaration.   In  this
              case  the effect of the option will be to cause the
              marked section not to be ignored.

       -l     Output L commands giving the  current  line  number
              and filename.

       -mfile Map  public  identifiers and entity names to system
              identifiers using  the  catalog  entry  file  file.
              Multiple  -m  options  are  allowed.  Catalog entry
              files specified with the -m option will be searched
              before the defaults.

       -p     Parse only the prolog.  Sgmls will exit after pars-
              ing the document type declaration.  Implies -s.

       -r     Warn about defaulted references.

       -s     Suppress output.   Error  messages  will  still  be
              printed.

       -u     Warn about undefined elements: elements used in the
              DTD but not defined.

       -v     Print the version number.

   Entity Manager
       An external entity resides in  one  or  more  files.   The
       entity manager component of sgmls maps a sequence of files
       into an entity in three sequential stages:

       1.     each carriage return character  is  turned  into  a
              non-SGML character;

       2.     each  newline character is turned into a record end
              character, and at the  same  time  a  record  start
              character  is  inserted  at  the  beginning of each
              line;

       3.     the files are concatenated.

       A system identifier is interpreted as a list of  filenames
       separated by colons.  A filename of - can be used to refer
       to the standard input.

       If a system identifier is not specified, then  the  entity



                                                                2





SGMLS(1)                                                 SGMLS(1)


       manager  can generate one using catalog entry files in the
       format defined in the SGML Open Draft Technical Resolution
       on  Entity  Management.   A  catalog entry file contains a
       sequence of entries in one of the following four forms:

       PUBLIC pubid sysid
              This specifies that sysid should  be  used  as  the
              system  identifier  if the the public identifier is
              pubid.  Sysid is a system identifier as defined  in
              ISO  8879  and  pubid  is  a  public  identifier as
              defined in ISO 8879.

       ENTITY name sysid
              This specifies that sysid should  be  used  as  the
              system identifier if the entity is a general entity
              whose name is name.

       ENTITY %name sysid
              This specifies that sysid should  be  used  as  the
              system  identifier  if  the  entity  is a parameter
              entity whose name is name.  Note that there  is  no
              space between the % and the name.

       DOCTYPE name sysid
              This  specifies  that  sysid  should be used as the
              system  identifier  if  the  entity  is  an  entity
              declared in a document type declaration whose docu-
              ment type name is name.

       The last two forms are extensions to the SGML Open format.
       The  delimiters  can be omitted from the sysid provided it
       does not contain any white space.   Comments  are  allowed
       between  parameters delimited by -- as in SGML.  The envi-
       ronment  variable  SGML_CATALOG_FILES  contains  a  colon-
       separated  list  of  catalog  entry  files.  These will be
       searched after any catalog entry files specified using the
       -m  option.  If this environment variable is not set, then
       a system dependent list of catalog  entry  files  will  be
       used.   A match in a catalog entry file for a PUBLIC entry
       will take precedence over a match in the same file for  an
       ENTITY  or  DOCTYPE entry.  A filename in a system identi-
       fier in a catalog entry file is  interpreted  relative  to
       the directory containing the catalog entry file.

       If no match can be found in a catalog entry file, then the
       entity manager will attempt to generate a  filename  using
       the public identifier (if there is one) and other informa-
       tion available to it.  Notation identifiers are  not  sub-
       ject to this treatment.  This process is controlled by the
       environment variable SGML_PATH;  this  contains  a  colon-
       separated list of filename templates.  A filename template
       is a filename that may contain substitution fields; a sub-
       stitution field is a % character followed by a single let-
       ter that indicates the value  of  the  substitution.   The



                                                                3





SGMLS(1)                                                 SGMLS(1)


       value  of  a substitution can either be a string or it can
       be null.  The entity manager transforms the list of  file-
       name  templates  into  a list of filenames by substituting
       for each substitution field and  discarding  any  template
       that  contained a substitution field whose value was null.
       It then uses the first resulting filename that exists  and
       is  readable.   Substitution values are transformed before
       being used for substitution: firstly, any names that  were
       subject  to  upper  case  substitution are folded to lower
       case; secondly, space characters are mapped to underscores
       and  slashes  are mapped to percents.  The value of the %S
       field is not  transformed.   The  values  of  substitution
       fields are as follows:

       %%     A single %.

       %D     The entity's data content notation.  This substitu-
              tion will succeed only for external data  entities.

       %N     The entity, notation or document type name.

       %P     The public identifier if there was a public identi-
              fier, otherwise null.

       %S     The system identifier if there was a system identi-
              fier otherwise null.

       %X     (This  is  provided  mainly  for compatibility with
              ARCSGML.)  A three-letter string chosen as follows:
                                         |            |
                                         |            | With public identifier
                                         |            +-------------+-----------
                                         | No public  |   Device    |  Device
                                         | identifier | independent | dependent
              ---------------------------+------------+-------------+-----------
              Data or subdocument entity | nsd        | pns         | vns
              General SGML text entity   | gml        | pge         | vge
              Parameter entity           | spe        | ppe         | vpe
              Document type definition   | dtd        | pdt         | vdt
              Link process definition    | lpd        | plp         | vlp

              The  device  dependent  version  is selected if the
              public text class allows a public text display ver-
              sion  but no public text display version was speci-
              fied.

       %Y     The type of thing for which the filename  is  being
              generated:
              SGML subdocument entity    sgml
              Data entity                data
              General text entity        text
              Parameter entity           parm
              Document type definition   dtd




                                                                4





SGMLS(1)                                                 SGMLS(1)


              Link process definition    lpd

       The  value  of  the  following substitution fields will be
       null unless a valid formal public identifier was supplied.

       %A     Null  if  the  text identifier in the formal public
              identifier contains an unavailable text  indicator,
              otherwise the empty string.

       %C     The public text class, mapped to lower case.

       %E     The   public   text  designating  sequence  (escape
              sequence) if the public text class is CHARSET, oth-
              erwise null.

       %I     The  empty  string  if  the owner identifier in the
              formal public identifier is an  ISO  owner  identi-
              fier, otherwise null.

       %L     The  public  text  language,  mapped to lower case,
              unless the public text class is CHARSET,  in  which
              case null.

       %O     The  owner  identifier  (with the +// or -// prefix
              stripped.)

       %R     The empty string if the  owner  identifier  in  the
              formal  public  identifier  is  a  registered owner
              identifier, otherwise null.

       %T     The public text description.

       %U     The empty string if the  owner  identifier  in  the
              formal  public  identifier is an unregistered owner
              identifier, otherwise null.

       %V     The public text display version.  This substitution
              will  be  null  if  the  public text class does not
              allow a display version or if no version was speci-
              fied.   If  an empty version was specified, a value
              of default will be used.

       Normally if the external identifier for an entity includes
       a system identifier, the entity manager will use the spec-
       ified system identifier and not attempt to  generate  one.
       If,  however, SGML_PATH uses the %S field, then the entity
       manager will first search for a matching entry in the cat-
       alog  entry files.  If a match is found, then this will be
       used instead of the specified system  identifier.   Other-
       wise,  if the specified system identifier does not contain
       any colons, the entity manager will use SGML_PATH to  gen-
       erate  a  filename.  Otherwise the entity manager will use
       the specified system identifier.




                                                                5





SGMLS(1)                                                 SGMLS(1)


   System declaration
       The system declaration for sgmls is as follows:

                          SYSTEM "ISO 8879:1986"
                                  CHARSET
       BASESET  "ISO 646-1983//CHARSET
                 International Reference Version (IRV)//ESC 2/5 4/0"
       DESCSET  0 128 0
       CAPACITY PUBLIC  "ISO 8879:1986//CAPACITY Reference//EN"
                                 FEATURES
       MINIMIZE DATATAG NO  OMITTAG  YES   RANK     NO  SHORTTAG YES
       LINK     SIMPLE  NO  IMPLICIT NO    EXPLICIT NO
       OTHER    CONCUR  NO  SUBDOC   YES 1 FORMAL   YES
       SCOPE    DOCUMENT
       SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Reference//EN"
       SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Core//EN"
                                 VALIDATE
                GENERAL YES MODEL    YES   EXCLUDE  YES CAPACITY YES
                NONSGML YES SGML     YES   FORMAL   YES
                                   SDIF
                PACK    NO  UNPACK   NO

       Exceeding a capacity limit will be ignored unless  the  -c
       option is given.

       The  memory usage of sgmls is not a function of the capac-
       ity points used by a document; however, sgmls  can  handle
       capacities significantly greater than the reference capac-
       ity set.

       In some environments, higher values may be  supported  for
       the SUBDOC parameter.

       Documents  that do not use optional features are also sup-
       ported.  For example, if FORMAL NO  is  specified  in  the
       SGML  declaration, public identifiers will not be required
       to be valid formal public identifiers.

       Certain parts of the concrete syntax may be changed:

              The shunned character numbers can be changed.

              Eight bit characters can be assigned  to  LCNMSTRT,
              UCNMSTRT, LCNMCHAR and UCNMCHAR.

              Uppercase substitution can be performed or not per-
              formed both for entity names and for other names.

              Either short reference delimiters assigned  by  the
              reference  delimiter  set  or  no  short  reference
              delimiters are supported.

              The reserved names can be changed.




                                                                6





SGMLS(1)                                                 SGMLS(1)


              The quantity set can be  increased  within  certain
              limits  subject  to  there  being sufficient memory
              available.  The upper limit on NAMELEN is 239.  The
              upper  limits on ATTCNT, ATTSPLEN, BSEQLEN, ENTLVL,
              LITLEN, PILEN, TAGLEN, and  TAGLVL  are  more  than
              thirty  times  greater  than  the reference limits.
              The upper limit on GRPCNT, GRPGTCNT, and GRPLVL  is
              253.   NORMSEP  cannot  be  changed.   DTAGLEN  are
              DTEMPLEN irrelevant since sgmls  does  not  support
              the DATATAG feature.

   SGML declaration
       The  SGML declaration may be omitted, the following decla-
       ration will be implied:
                             <!SGML "ISO 8879:1986"
                                     CHARSET
       BASESET  "ISO 646-1983//CHARSET
                 International Reference Version (IRV)//ESC 2/5 4/0"
       DESCSET    0  9 UNUSED
                  9  2  9
                 11  2 UNUSED
                 13  1 13
                 14 18 UNUSED
                 32 95 32
                127  1 UNUSED
       CAPACITY PUBLIC  "ISO 8879:1986//CAPACITY Reference//EN"
       SCOPE    DOCUMENT
       SYNTAX   PUBLIC  "ISO 8879:1986//SYNTAX Reference//EN"
                                    FEATURES
       MINIMIZE DATATAG NO OMITTAG  YES          RANK     NO  SHORTTAG YES
       LINK     SIMPLE  NO IMPLICIT NO           EXPLICIT NO
       OTHER    CONCUR  NO SUBDOC   YES 99999999 FORMAL   YES
                                  APPINFO NONE>
       with the exception that characters 128 through 254 will be
       assigned to DATACHAR.

       Sgmls identifies base character sets using the designating
       sequence in the public identifier.  The  following  desig-
       nating sequences are recognized:
         Designating          ISO         Minimum      Number
            Escape        Registration   Character       of             Description
           Sequence          Number       Number     Characters
       ------------------------------------------------------------------------------------
       ESC 2/5 4/0             -             0          128       full set of ISO 646 IRV
       ESC 2/8 4/0              2           33           94       G0 set of ISO 646 IRV
       ESC 2/8 4/2              6           33           94       G0 set of ASCII
       ESC 2/13 4/1           100           32           96       G1 set of ISO 8859-1
       ESC 2/1 4/0              1            0           32       C0 set of ISO 646
       ESC 2/2 4/3             77            0           32       C1 set of ISO 6429
       ESC 2/5 2/15 3/0        -             0          256       the system character set

       When one of the G0 sets is used as a base set, the charac-
       ters SPACE and DELETE are treated as  occurring  at  posi-
       tions  32  and 127 respectively; although these characters



                                                                7





SGMLS(1)                                                 SGMLS(1)


       are not part of  the  character  sets  designated  by  the
       escape  sequences,  this  mimics the behaviour of ISO 2022
       with respect to these code positions.

   Output format
       The output is a series of lines.  Lines can be arbitrarily
       long.   Each line consists of an initial command character
       and one or more arguments.  Arguments are separated  by  a
       single  space,  but when a command takes a fixed number of
       arguments the last argument can contain spaces.  There  is
       no space between the command character and the first argu-
       ment.   Arguments  can  contain   the   following   escape
       sequences.

       \\     A \.

       \n     A record end character.

       \|     Internal SDATA entities are bracketed by these.

       \nnn   The character whose code is nnn octal.

       A  record  start  character  will  be represented by \012.
       Most applications will need to ignore \012  and  translate
       \n into newline.

       The  possible command characters and arguments are as fol-
       lows:

       (gi    The start of an element whose generic identifier is
              gi.  Any attributes for this element will have been
              specified with A commands.

       )gi    The end an element whose generic identifier is  gi.

       -data  Data.

       &name  A  reference  to an external data entity name; name
              will have been defined using an E command.

       ?pi    A processing instruction with data pi.

       Aname val
              The next element to start  has  an  attribute  name
              with  value  val  which  takes one of the following
              forms:

              IMPLIED
                     The value of the attribute is implied.

              CDATA data
                     The attribute is character  data.   This  is
                     used  for attributes whose declared value is
                     CDATA.



                                                                8





SGMLS(1)                                                 SGMLS(1)


              NOTATION nname
                     The attribute is a notation name; nname will
                     have  been  defined using a N command.  This
                     is used for attributes whose declared  value
                     is NOTATION.

              ENTITY name...
                     The  attribute  is  a list of general entity
                     names.  Each  entity  name  will  have  been
                     defined using an I, E or S command.  This is
                     used for attributes whose declared value  is
                     ENTITY or ENTITIES.

              TOKEN token...
                     The  attribute is a list of tokens.  This is
                     used for attributes whose declared value  is
                     anything else.

       Dename name val
              This  is  the same as the A command, except that it
              specifies a data attribute for an  external  entity
              named  ename.  Any D commands will come after the E
              command that  defines  the  entity  to  which  they
              apply,  but  before any & or A commands that refer-
              ence the entity.

       Nnname nname.  Define a notation This command will be pre-
              ceded  by  a p command if the notation was declared
              with a public identifier, and by a s command if the
              notation  was declared with a system identifier.  A
              notation will only be defined if it is to be refer-
              enced  in  an  E  command or in an A command for an
              attribute with a declared value of NOTATION.

       Eename typ nname
              Define an external data  entity  named  ename  with
              type  typ (CDATA, NDATA or SDATA) and notation not.
              This command will be preceded by one or more f com-
              mands  giving the filenames generated by the entity
              manager from the system and public identifiers,  by
              a p command if a public identifier was declared for
              the entity, and by a s command if a system  identi-
              fier  was  declared  for the entity.  not will have
              been defined using a N  command.   Data  attributes
              may  be  specified for the entity using D commands.
              An external data entity will only be defined if  it
              is  to be referenced in a & command or in an A com-
              mand for  an  attribute  whose  declared  value  is
              ENTITY or ENTITIES.

       Iename typ text
              Define  an  internal  data  entity named ename with
              type typ (CDATA or SDATA) and entity text text.  An
              internal  data entity will only be defined if it is



                                                                9





SGMLS(1)                                                 SGMLS(1)


              referenced in an A command for an  attribute  whose
              declared value is ENTITY or ENTITIES.

       Sename Define a subdocument entity named ename.  This com-
              mand will be preceded by one  or  more  f  commands
              giving  the  filenames generated by the entity man-
              ager from the system and public identifiers, by a p
              command if a public identifier was declared for the
              entity, and by a s command if a  system  identifier
              was  declared for the entity.  A subdocument entity
              will only be defined if it is  referenced  in  a  {
              command  or  in an A command for an attribute whose
              declared value is ENTITY or ENTITIES.

       ssysid This command applies to the next E, S or N  command
              and specifies the associated system identifier.

       ppubid This  command applies to the next E, S or N command
              and specifies the associated public identifier.

       ffilename
              This command applies to the next E or S command and
              specifies  an  associated  filename.  There will be
              more than one f command for a single E or S command
              if the system identifier used a colon.

       {ename The  start  of  the  SGML subdocument entity ename;
              ename will have been defined using a S command.

       }ename The end of the SGML subdocument entity ename.

       Llineno file
       Llineno
              Set the current  line  number  and  filename.   The
              filename  argument will be omitted if only the line
              number has changed.  This will be  output  only  if
              the -l option has been given.

       #text  An  APPINFO  parameter of text was specified in the
              SGML declaration.  This is not strictly part of the
              ESIS,  but  a  structure-controlled  application is
              permitted to act on it.  No # command will be  out-
              put  if  APPINFO NONE  was  specified.  A # command
              will occur at most once, and may be  preceded  only
              by a single L command.

       C      This command indicates that the document was a con-
              forming SGML document.  If this command is  output,
              it  will  be the last command.  An SGML document is
              not  conforming  if  it  references  a  subdocument
              entity that is not conforming.

BUGS
       Some  non-SGML  characters  in literals are counted as two



                                                               10





SGMLS(1)                                                 SGMLS(1)


       characters for the purposes of quantity and capacity  cal-
       culations.

SEE ALSO
       The SGML Handbook, Charles F. Goldfarb
       ISO  8879 (Standard Generalized Markup Language), Interna-
       tional Organization for Standardization

ORIGIN
       ARCSGML was written by Charles F. Goldfarb.

       Sgmls  was   derived   from   ARCSGML   by   James   Clark
       (jjc@jclark.com), to whom bugs should be reported.












































                                                               11


