.nr LL 6.5i
.nr PO 1.0i
.nr PS 12
.nr VS 14
.ll 6.5i
.po 1.0i
.ps 12
.vs 14
.ta 6.5iR
.ds RF [Page %]
.rm CF
.hy 0
.de LE
.RT
.nr Ct 1
.nr 2T 0
.ne 1.1
.if !\\n(IP .nr IP +1
.in +\\n(I\\n(IRu
.ta \\n(I\\n(IRu
.ti -\\n(I\\n(IRu
\(bu\t\c
..
.ds h1 Internet Draft
.ds LH Internet Draft
.ds h0 \
Internet Engineering Task Force
.ds h4 \
D. Curry
.ie '\*(h2'' \{\
.ds h2 \*(h4
.\}
.el \{\
.da h8
.br
	\*(h4
.br
.da
.\}
.ds h5 \
Purdue University
.ie '\*(h3'' \{\
.ds h3 \*(h5
.\}
.el \{\
.da h8
.br
	\*(h5
.br
.da
.\}
.ds LF \
Curry
.ds RH \
June 1992
.nf
\*(h0	\*(h2
\*(h1	\*(h3
.h8
	\*(RH
.fi
.sp
\*(h9
.TL
Formatting Internet Documents With SGML
.ds CH \
SGML Internet Documents
.SH
Status of this Memo
.PP
The purpose of this Internet Draft is to focus discussion on the
particular problem of Internet document distribution formats, and
possible solutions to the problem.  This draft proposes a specific
solution, provides a sample implementation of this solution, and
requests discussion and suggestions for improvement.
.SH
Distribution
.PP
Distribution of this memo is unlimited.
.SH
Abstract
.PP
The problem of selecting a standard distribution format for Internet
documents (RFCs and Internet Drafts) has come up several times over
the past few years.  The usual argument is that ASCII, the current
standard format, is too limited for documents which require complex
equations, tables, and graphics, while PostScript, the alternative
format, is not universally printable and cannot be searched or read
on-line.  This document proposes the use of the \fIStandard
Generalized Markup Language\fP (SGML) as the standard distribution
format for Internet documents.  Brief descriptions of generalized
markup languages and SGML are given, followed by discussion of how
the use of SGML could benefit the Internet community.  The requirements
that must be met in order for Internet documents to be formatted with
SGML are presented.  Finally, a sample implementation of the use of
SGML for formatting Internet documents is discussed.
.bp
.NH 1
Introduction
.PP
Internet documents (RFCs and Internet Drafts) have, since their
inception, used a simple ASCII format [Postel89].  Although
there is a definite structure to the documents, with page headers and
footers, section headings, and so on, no specialized formatting (such
as overstrikes or underlining) is present in the documents.  This is
advantageous, since a simple text file can be printed on even the most
primitive output device, making Internet documents universally
available to all interested parties.  The ASCII format also has the
advantage that, because no special formatting is used, a document can
easily be searched or read on-line using text editors and other tools,
and sections of the document can easily be copied to other files.
.PP
Unfortunately, the ASCII format suffers from several limitations,
the most severe of which are the difficulties in formatting equations,
tables, and graphics.  For this reason, some recent Internet documents
have been distributed in PostScript format instead.  Although
sometimes an ASCII version of the document is also made available,
this version usually lacks the equations and graphics from the
PostScript version, which are vital to understanding the document.
.PP
The PostScript format's primary advantage is that it can easily
render the complex equations, tables, and graphics necessary for some
Internet documents.  However, the PostScript format is not suitable as
a general distribution format for several reasons.  First, not
everyone has access to a PostScript printer or on-line PostScript
previewer.  This means that PostScript Internet documents are no
longer universally available to all interested parties.  Second, the
PostScript language has been interpreted by different vendors in
different ways, with some vendors producing more \*Qconformant\*U
implementations than others.  This means that some parties cannot print
PostScript Internet documents even though they do have
access to a PostScript printer.  Finally, because the PostScript
format represents a document as a program, the document cannot be
easily searched or edited using text editors or other tools.
.PP
In May of 1992 the Internet Activities Board proposed a new rule
that would require any Request for Comments document that specifies an
Internet standard to have ASCII as its \fIreference\fP format.  A
PostScript version could still be made available as a \*Qprettier\*U
alternative, but the ASCII version would have to contain all necessary
equations, tables, and graphics.  This proposal sparked an intense
debate on the IETF mailing list, not for the first time, about the
relative merits of the ASCII and PostScript formats.  Unfortunately,
as such discussions are wont to do, this discussion turned into a
\*Qreligious war,\*U with little of a constructive nature being
accomplished.
.PP
The real problem, which has been until this point ignored, is that
although most people do not distinguish between the \fIcomponents\fP
of a document and the \fIformat\fP of those components, they are really
two quite different things.  For example, when reading this document,
the reader notes components such as section headings, paragraphs, and
sentences, and attaches certain meanings to those components (as well
as to the words they contain).  But the reader can do this regardless
of whether section headings are in bold face, italics, or small caps,
and regardless of whether paragraphs are indented three spaces or
flush left.  The important thing is that the reader be able to
distinguish the components, not what the components look like (although certain
combinations will be more aesthetically pleasing than others).
.PP
Thus, what is needed is a \*Qgeneric\*U document format which is
capable of identifying the components of a document such as
paragraphs and section headings, yet does not specify formatting
issues such as point size or type face.  This generic format can then
be translated independently by each reader into a specific format
suitable to his or her environment.  For example, a UNIX user might
translate the document into \fCtroff\fP, while a PC user might
translate it into Microsoft Word.  These two formats have very different
rules about what paragraphs, section headings, and so on are to look
like, and yet the two readers will both be able to get the same
information from the document.
.NH 1
Basic Concepts of Generalized Markup
.PP
\fIGeneralized markup\fP is the process of combining a formalized
document structure with a set of generic tags that are used to
identify the various components of a document.
.NH 2
Document Structure
.PP
Any document can be viewed as having two types of structure:  a
logical structure that is for the most part hidden from view and not
immediately obvious, and a layout structure that is directly
recognizable.
.NH 3
Layout Structure
.PP
The layout structure of a document is the physical representation
of the document on a piece of paper or computer display screen.
Examples of layout structure include a centered line, a section
heading in bold face, and a paragraph with a hanging tag.
.NH 3
Logical Structure
.PP
The logical structure of a document is defined exclusively by the
sequence of its constituent components and their relationship to each
other.  The intellectual content of the document is made up of these
components, which were grouped into a specific sequence by the author
in order to have some logical significance to the reader.  Examples of
logical components of a document include chapters, headings,
paragraphs, and figures.
.PP
The logical components of a document have a hierarchical
relationship to each other.  The highest level of the hierarchy is the
document, and the lowest level is the individual characters.  In
between these levels are chapters, which contain sections, which in
turn contain paragraphs, which contain sentences, which contain words,
and so on.  Different types of documents, such as books, magazine
articles, and letters, contain different components arranged in
different hierarchies.
.PP
The logical structure of a document is independent of the layout of
its content on a page, or of the format of its content as stored on a
computer system.
.NH 2
Generic Tagging
.PP
\fIGeneric tagging\fP is the process of inserting identifiers, or
\fItags\fP, into a document to mark its principal components.  For
example, tags would be used to identify the beginning and end of a
paragraph.  Generic tags identify the components of a document without
specifying the ultimate processing method, visual or otherwise, to
be used with that component.  The tagged components can be recognized
and processed at each step from manuscript creation through production
of publications, as well as in other applications such as information
retrieval.
.NH 2
Categories of Generalized Markup
.PP
There are three principal categories of generalized markup
[ANSI88]:
.RS
.LE
Descriptive markup (generic tags)
.LE
References to information contained elsewhere, such as digitized
graphic images
.LE
Markup declarations
.RE
.RT
.PP
Descriptive markup is the process of marking each significant
component of a document with an identifier, or tag.  The tag for a
component serves to distinguish that component from any other
component.  Full markup of a component involves specifying a start-tag
to identify the beginning of the component, and an end-tag to identify
the end of the component.  However, in many cases it is possible to
reduce markup by recognizing that the start-tag for a second component
can also serve to identify the end of the first component.  Thus, a
completely marked-up component consists of a start-tag to identify the
beginning of the component, the component itself, and, if necessary,
an end-tag identifying the end of the component.
.PP
References are call-outs to data not residing in the document
itself, for example, a digitized graphic.  These data are called
\fIentities\fP.  A pointer or reference to the entity is placed in the
document at the location where it belongs.  When the document is
processed, the entity reference is replaced with the contents of the
entity.  Entity references can also be used to obtain special
characters not in the regular keyboard character set (e.g., math
symbols), or to provide abbreviations.
.PP
Markup declarations allow the document designer to define the
structure of a particular type of document.  For example, a designer
can define a single component of a document, say ADDRESS, as NAME,
STREET, CITY, and STATE, and he can specify that the four
subcomponents must appear in exactly that order.  Later, ADDRESS might
be extended to include ORGANIZATION.  Markup declarations are also
used to specify the values of entity references and to specify the
locations of any entities stored outside the document.
.NH 1
The Standard Generalized Markup Language (SGML)
.PP
The \fIStandard Generalized Markup Language\fP, or SGML, was
developed by the American National Standards Institute committee X3V1,
its International Organization for Standardization counterpart, ISO
TC97/SC18, and the U.S. National Information Standards Organization
(Z39).  The participants in its development came from many disciplines
and backgrounds in the publishing, information processing, library,
and research communities [AAP87a].  SGML was made an
international standard in 1986, and is described in detail in ISO
8879-1986.
.PP
Some major applications of SGML include:
.RS
.LE
The \fIAmerican National Standard for Electronic Manuscript
Preparation and Markup\fP ANSI/NISO Z39.59-1988, defines document
types and markup tags for books, articles, and serials.  Z39.59 was
developed by the American Association of Publishers (AAP) and became
an American National Standard in December 1988.
.LE
The United States Department of Defense has adopted SGML as part
of its Computer-Aided Acquisition and Logistics Support (CALS)
program.  CALS is intended to integrate the automated parts of the
design and logistics processes, including the creation of technical
documentation.
.LE
The Text Encoding Initiative is an international research
project to develop guidelines for the encoding and interchange of
machine-readable texts.  The TEI is sponsored by the Association for
Computers and the Humanities, the Association for Computational
Linguistics, and the Association for Literary and Linguistic
Computing.  It is funded by the U.S. National Endowment for the
Humanities, DG XIII of the Commission of the European Communities, the
Social Sciences and Humanities Research Council, and the Andrew W.
Mellon Foundation.  A principal part of the TEI recommendations is the
use of SGML.
.LE
The Swiss Historical Lexicon, a 12-volume national encyclopedia
printed in four languages, is using SGML as its markup language.
.RE
.RT
.NH 2
SGML Character Set
.PP
The base character set for SGML is the ISO 646 (7-bit coded character
set for information processing interchange) International Reference
Version (IRV).  ASCII is the American version of ISO 646 IRV and differs
only in positions 2/4 and 7/14, which are the dollar sign and tilde in
ASCII, and the general currency sign and overline in ISO 646 IRV.
.PP
It is important to note that because the base character set for
SGML is essentially ASCII, a document that is marked up with SGML will
remain readable even without being run through an SGML-to-something
translator.
.NH 2
SGML Syntax for Tags
.PP
An SGML tag consists of four parts: the opening delimiter, the name
of the document component (called a \fIgeneric identifier\fP), optional
attributes, and a closing delimiter.  The syntax of an SGML start-tag
is shown in Figure 1.
.LD
.ps -2
.vs -2
.ds h5 \
Figure 1.  SGML Start-Tag Syntax

     \fC<\fP\fIname attribute\fP\fC=\fP\*Qvalue\*U\fC>\fP

     Where:
        \fC<\fP is the opening delimiter
        \fC>\fP is the closing delimiter

        \fIname\fP is the generic identifier

        \fIattribute\fP=\*Qvalue\*U represents an attribute and its assigned value for that particular element.
            Not all tags will have attributes; some tags may have more than one.

.ps
.vs
.if !'\*(h5'' \{\
.ce
\fB\*(h5\fP
.rm h5
.\}
.DE
.PP
An SGML end-tag usually has three parts: the opening delimiter
(\fB</\fP) (a less-than sign followed immediately by a forward
slash), the generic identifier from the start-tag, and a closing
delimiter, (\fB>\fP), as shown in Figure 2.
.LD
.ps -2
.vs -2
.ds h5 \
Figure 2.  SGML End-Tag Syntax

     \fC</\fP\fIname\fP\fC>\fP

.ps
.vs
.if !'\*(h5'' \{\
.ce
\fB\*(h5\fP
.rm h5
.\}
.DE
.PP
Generic identifiers may have from one to eight characters, from the
set of letters, numbers, period, and hyphen.  The first character must
be a letter.  For ease of typing, case is not distinguished in generic
identifiers (that is, \fB<A>\fP and \fB<a>\fP are the same tag).
.PP
The generic identifiers for specific document components (called
\fIelements\fP in SGML) are defined using special markup declarations
called \fIelement declarations\fP; these declarations are beyond the
scope of this document, although examples of them can be seen in
Appendix A.
.NH 2
Entity References and Declarations
.PP
An entity reference is used in SGML to call in data that is not
contained in the document itself.  This data can be a special
character (one not in ISO 646 IRV), a character string (for example,
the spelled-out version of an acronym), or a system file.
.PP
An entity reference is made up of an opening delimiter
(\fB&\fP), the identifier of the entity (called the \fIentity
reference name\fP), and a closing delimiter (\fB;\fP).  For example,
suppose that a \*Qpi\*U symbol is needed in the document.  Since this
is not a part of the ISO 646 IRV character set, the author would enter
the name of this symbol as an entity reference, \fB&pi;\fP.
Later, when the document is processed, the entity reference would be
replaced with the actual \*Qpi\*U symbol.
.PP
The syntax for entity declarations with replacement text is shown
in Figure 3.
.LD
.ps -2
.vs -2
.ds h5 \
Figure 3.  Entity Declaration with Replacement Text

     \fC<!ENTITY \fP\fIname\fP \*Qreplacement text\*U\fC>\fP

     Where:
        \fC<\fP is the opening delimiter
        \fC>\fP is the closing delimiter

        \fIname\fP is the entity identifier

        \*Qreplacement text\*U represents the replacement text

.ps
.vs
.if !'\*(h5'' \{\
.ce
\fB\*(h5\fP
.rm h5
.\}
.DE
.PP
For example, consider the declaration
.LD
.ps -2
.vs -2
     \fC<!ENTITY tcp \fP\*QTransmission Control Protocol\*U\fC>\fP
.ps
.vs
.if !'\*(h5'' \{\
.ce
\fB\*(h5\fP
.rm h5
.\}
.DE
.PP
This would be embedded in the document as follows
.LD
.ps -2
.vs -2
     \fCThe &tcp; is a connection-oriented protocol.\fP
.ps
.vs
.if !'\*(h5'' \{\
.ce
\fB\*(h5\fP
.rm h5
.\}
.DE
.PP
And would be resolved at processing time into
.LD
.ps -2
.vs -2
     \fCThe Transmission Control Protocol is a connection-oriented protocol.\fP
.ps
.vs
.if !'\*(h5'' \{\
.ce
\fB\*(h5\fP
.rm h5
.\}
.DE
.PP
An external identifier is used in an entity reference to indicate
an item that must be obtained from elsewhere on the system.  An entity
declaration with an external identifier is shown in Figure 4.
.LD
.ps -2
.vs -2
.ds h5 \
Figure 4.  Entity Declaration with External Identifier

     \fC<!ENTITY \fP\fIname\fP \fCSYSTEM \fP\*Qsystem id\*U\fC>\fP

     Where:
        \fC<\fP is the opening delimiter
        \fC>\fP is the closing delimiter

        \fIname\fP is the entity identifier

        \fCSYSTEM\fP indicates that the replacement text is located outside the document

        \*Qsystem id\*U describes the location in a system-dependent manner

.ps
.vs
.if !'\*(h5'' \{\
.ce
\fB\*(h5\fP
.rm h5
.\}
.DE
.PP
For example, a graphic encoded in G3-Fax format and stored in the
file \fIfigdata.g3fax\fP might be declared as follows:
.LD
.ps -2
.vs -2
     \fC<!ENTITY myfig SYSTEM \fP\*Qfigdata.g3fax\*U\fC>\fP
.ps
.vs
.if !'\*(h5'' \{\
.ce
\fB\*(h5\fP
.rm h5
.\}
.DE
.PP
The entity reference would be embedded in the document as follows:
.LD
.ps -2
.vs -2
\fC     Text text text
     Text text text
     &myfig;
     Text text text\fP
.ps
.vs
.if !'\*(h5'' \{\
.ce
\fB\*(h5\fP
.rm h5
.\}
.DE
.PP
There are other types of entity declarations not described here;
consult the SGML standard for complete information on the subject.
.NH 2
The Document Type Definition (DTD)
.PP
The \fIdocument type definition\fP (DTD) consists of element
declarations describing each particular element of a document and in
which order they may occur, entity declarations defining
abbreviations, special characters, etc., and other declarations
defining things such as tag attributes, shorthand notation for tags,
and so on.  The DTD is read by an SGML parser to determine the
appropriate format for the document type that it describes.
.PP
When a document marked up with SGML is processed, the processor
checks the structure of the document as defined by its tags against
the DTD, and informs the author of any errors.  In this way, the DTD
can be used to enforce a particular document style.
Because the DTD specifies the order in which components occur, an
SGML processor can insert missing markup in an unambiguous fashion.
For example, it is usually possible to leave out the end-tags
for most elements of a document; the SGML processor can infer their
presence from the presence of the next start-tag.
.PP
The syntax of document type definitions is beyond the scope of this
document; consult the SGML standard for complete information.  An
example DTD for Internet documents is presented in Appendix A; this
DTD was used to produce the document you are reading now.
.NH 1
SGML and Internet Documents
.PP
In this document, it is proposed that SGML be made the standard
distribution format for Internet documents.  All future RFCs and
Internet Drafts would be required to contain proper SGML markup as
discussed in the following sections.  It is not proposed that RFCs and
Internet Drafts issued before the adoption of SGML be translated;
this would be a monumental task with little benefit (although it might
be beneficial to translate the PostScript-only RFCs).
.NH 2
Requirements
.PP
In order to adopt SGML as the standard format for Internet
documents, three principal goals must be met before adoption of the
standard:
.RS
.LE
A custom document type definition must be constructed that
describes the format of an Internet document.  This DTD must include
all components likely to be used in an Internet document, including
complex tables, equations, and graphics.  It may also be desirable to
allow the inclusion of external figures in some non-SGML format,
although this is likely to result in the same problem that exists now
with PostScript documents.  The hardest parts of this process will be
determining what components should be included in the DTD, and
defining the DTD in a general enough fashion that it can be translated
into a wide variety of other formats.
.LE
Software to perform the translations from SGML to other formats
must be developed.  At least three different SGML parsers are
available for anonymous FTP from sites around the Internet.  The
software that would need to be developed could use one or more of
these parsers as a base; the new code would need to provide the
translation function only.
.LE
Documentation would need to be developed on the use of SGML for
writing Internet documents, including descriptions of all elements,
entities, shortcuts, and so on.
.RE
.RT
.PP
By far the most daunting of these goals is the second, software
development.  However, it would not be necessary to produce software
for every variety of computer connected to the Internet.  Rather,
archive sites which make Internet documents available via mail servers
could provide the service in such a way that the user requests a
document in a particular format, and it is translated into that format
before being sent.  The common practice in the Internet of making
software freely available whenever possible would likely lead to a
wide variety of translators becoming available in a short period of
time.
.NH 2
Advantages
.PP
One of the principal advantages of using SGML to format Internet
documents is that although it provides the capabilities to include
complex tables, equations, and graphics in a document, its reliance on
ISO 646 IRV as its base character set means that documents will still
be searchable on-line, and can be edited and copied with ease.
Further, because any new Internet document would be formatted with
SGML, material copied from an older SGML-format document can be
inserted into the new document with no reformatting required.
.PP
Although several commercial SGML \*Qeditors\*U are available that
provide features such as menu-selectable tags, tag hiding, and WYSIWYG
display, these packages are \fInot\fP necessary to create SGML
documents.  Again, because SGML's base character set is ISO 646 IRV,
any simple text editor can be used to create an SGML document with
relative ease.  Sophisticated text editors can provide additional
features such as macro definitions for tags and so on (a freely
available \*QSGML Mode\*U already exists for GNU EMACS).  Thus,
SGML allows the creation of sophisticated documents without requiring
the use of sophisticated tools.
.PP
SGML also provides advantages in formatting documents.  By using
the document type definition to enforce a particular style, it will be
possible to make all Internet documents conform to certain standards
(for example, requiring a section on security or making all citations
follow the same format).  Although this is already done to some extent
by the RFC Editor, the automation of the task will allow stricter
enforcement.  This will aid the reader, particularly when referring to
multiple documents, by ensuring that everything is present, in the
same order, and in the same general format.
.PP
SGML would also permit new services to be developed around Internet
documents; a prime example of this is information retrieval.  Several
information retrieval projects are currently in development around the
Internet, including WAIS, Gopher, and the World Wide Web.
By providing Internet
documents in SGML format, the markup could easily be used by these
systems to decide which parts of a document to search, how much of a
document to retrieve, and so on.  This would give the user the ability
to look up, for example, a specific parameter of a protocol in the
protocol specification without having to FTP the entire RFC and wade
through it.
.NH 2
Disadvantages
.PP
There are of course, some disadvantages to using SGML for Internet
documents.  First and foremost, software to translate SGML into other
formats needs to be developed, as mentioned previously.
.PP
Additionally, SGML will require that authors of Internet documents
learn how to use it; they will no longer be able to use whatever formatter
they are most familiar with.  (Although it is conceivable that at some
point, some people may prefer to switch over to SGML for their other
documents as well, making this less of an issue.)
.PP
Finally, careful thought must be given to all stages of the design
of a document type definition for Internet documents.  In
particular, tables, equations, and graphics will be very difficult to
handle in a generic fashion.  This is not to say they are impossible;
the American Association of Publishers has already produced DTDs which
include both complex tables and equations [AAP87b, AAP87c].
This perhaps should not be described as a disadvantage of SGML, but
rather as simply a caveat of its use.
.NH 1
Sample Implementation
.PP
As a part of this proposal, in the belief that it's always better
to have an implementation before putting it forth as the solution to
all mankind's problems, the author has developed a sample
implementation of SGML-format Internet documents.  This implementation
is not suitable for adoption as a standard as it does not handle all
aspects of formatting that the standard implementation would have to.
However, it does demonstrate that the idea is feasible, and at the
same time provides something for others to \*Qplay with\*U in order
to better form their own opinions.
.NH 2
The INETDOC1 Document Type Definition
.PP
The first part of the sample implementation is a basic document
type definition for Internet documents, called INETDOC1.  Originally,
some existing DTDs were examined for use with Internet documents, in
particular the DTDs from the AAP.  However, Internet documents have
their own specific format which does not quite fit into existing
DTDs.  The incompatibilities lie primarily in the required sequencing
of material in an Internet document, rather than unusual document
components.
.NH 3
Included Components
.PP
The INETDOC1 DTD contains the following elements:
.IP "\fBrfc | indraft\fP" 1i
The document must be either an RFC or an
Internet Draft; this controls the content of the headings at the top
of each page.  Both an RFC and an Internet Draft consist of the
following elements in this order: the working group name, the list of
authors and their institutions, a short authors' name string for use
in the page footer, the date of publication, an optional relationship
to other documents, the title, a short title for use in the page
header, the document status, the distribution restrictions, an
abstract, the body of the document, an optional acknowledgements
section, an optional references section, a security considerations
section, and a list of authors' addresses.
.RT
.IP "\fBabstract\fP" 1i
The abstract of the document.  Both start-tag and
end-tag are required.  Within this element, the following tags may be
used: \fBp, qp, tp, list\fP.
.RT
.IP "\fBacks\fP" 1i
The acknowledgments section.  Both start-tag and
end-tag are required.  Within this element, the following tags may be
used: \fBp, qp, tp, list\fP.
.RT
.IP "\fBaddress\fP" 1i
An address, for use in the author's address section.
The end-tag for this element is optional.  Within this element, the
following tags may be used: \fBname, org, street, city, csub, pcode,
country, phone, fax, email\fP.
.RT
.IP "\fBaref\fP" 1i
A reference to an article.  The end-tag for this
element is optional.  Within this element, the following tags may be
used: \fBcite, rauthors, atitle, ptitle, vol, pdate, pages\fP.
.RT
.IP "\fBatitle\fP" 1i
The title of an article in a reference.
The end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBauthaddr\fP" 1i
The authors' addresses section.  Both start-tag and
end-tag are required.  Within this element, the following tags may be
used: \fBaddress\fP.  This element has one optional attribute, the number
of authors in the list.  The attribute must be supplied when there is
more than one author.
.RT
.IP "\fBauthinst\fP" 1i
An author's institution, used in an author list.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBauthlist\fP" 1i
The list of authors' names and institutions.  Both
start-tag and end-tag are required.  Within this element, the
following tags may be used: \fBauthname, authinst\fP.  This element
has one optional attribute, the number of authors in the list.  The
attribute must be supplied when there is more than one author.
.RT
.IP "\fBauthname\fP" 1i
An author's name, used in an author list.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBb\fP" 1i
Bold face.  Both start-tag and end-tag are required.
.RT
.IP "\fBbody\fP" 1i
The body of the document.  Both start-tag and end-tag
are required.  Within this element, the following tags may be used:
\fBh0, h1, h2, h3, h4, h5, h6, h7, h8, h9, p, qp, tp, list, deq, fig,
tbl\fP.
.RT
.IP "\fBbref\fP" 1i
A reference to a book.  The end-tag for this element is
optional.  Within this element, the following tags may be used:
\fBcite, rauthors, btitle, pubcity, pubname, pubdate\fP.
.RT
.IP "\fBbtitle\fP" 1i
A book title in a reference.
The end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBc\fP" 1i
Constant width.  Both start-tag and end-tag are required.
.RT
.IP "\fBcite\fP" 1i
A reference citation.  Both start-tag and end-tag are
required.  No other tags are valid within this element.
.RT
.IP "\fBcity\fP" 1i
The city part of an address.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBcol\fP" 1i
A column of a table.  The end-tag for this element is
optional.  Within this element, the following tags may be used: \fBb,
c, i, r, q, emq\fP.
.RT
.IP "\fBcountry\fP" 1i
The country part of an address.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBcsub\fP" 1i
The country subdivision (e.g., province or state) part
of an address.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBdate\fP" 1i
The publication date of the document.  The end-tag for
this element is optional.  No other tags are valid within this element.
.RT
.IP "\fBdeq\fP" 1i
A displayed equation.  This element is not implemented
in the current version of INETDOC1.
.RT
.IP "\fBdist\fP" 1i
The distribution restrictions on the document.  The
end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBdline\fP" 1i
A double horizontal line in a table.  The end-tag for
this element is optional.  No other tags are valid within this element.
.RT
.IP "\fBemail\fP" 1i
The electronic mail address part of an address.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBemq\fP" 1i
An embedded quotation.  Both start-tag and end-tag are
required.  Within this element, the following tags may be used: \fBb,
c, i, r\fP.
.RT
.IP "\fBeq\fP" 1i
An in-line equation.  This element is not implemented in
the current version of INETDOC1.
.RT
.IP "\fBfax\fP" 1i
The facsimile number part of an address.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBfdata\fP" 1i
The \*Qdata\*U part of a figure.
The end-tag for this element is optional.  Within this element, the
following tags may be used: \fBb, c, i, r, q, emq\fP.
.RT
.IP "\fBfig\fP" 1i
A figure.  Both start-tag and end-tag are required.
Within this element, the following tags may be used: \fBftitle,
fdata\fP.
.RT
.IP "\fBftitle\fP" 1i
A figure title.
The end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBh#\fP" 1i
A section heading of level #, where # ranges from 0 to 9
inclusive.  The end-tag for this element is optional.  No other tags
are valid within this element.
.RT
.IP "\fBi\fP" 1i
Italics.  Both start-tag and end-tag are required.
.RT
.IP "\fBitem\fP" 1i
An item in an itemized list.  The end-tag for this
element is optional.  Within this element, the following tags may be
used: \fBcite, eq, b, c, i, r, q, emq\fP.
.RT
.IP "\fBlist\fP" 1i
An itemized list.  Both start-tag and end-tag are
required.  Within this element, the following tags may be used:
\fBitem\fP.
.RT
.IP "\fBname\fP" 1i
The personal name part of an address.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBobs\fP" 1i
A list of documents obsoleted by the current document.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBorg\fP" 1i
The organization part of an address.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBp\fP" 1i
A paragraph.  The end-tag for this element is optional.
Within this element, the following tags may be used: \fBcite, eq, b,
c, i, r, q, emq\fP.
.RT
.IP "\fBpages\fP" 1i
The page numbers in a reference.
The end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBpcode\fP" 1i
The postal code part of an address.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBpdate\fP" 1i
The periodical date in a reference.
The end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBphone\fP" 1i
The telephone number part of an address.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBptitle\fP" 1i
The periodical title in a reference.
The end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBpubcity\fP" 1i
The city of publication in a reference.
The end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBpubdate\fP" 1i
The date of publication in a reference.
The end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBpubname\fP" 1i
The publisher's name in a reference.
The end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBq\fP" 1i
Quotation.  Both start-tag and end-tag are required.
Within this element, the following tags may be used: \fBb, c, i, r,
emq\fP.
.RT
.IP "\fBqp\fP" 1i
A paragraph of quoted material.  The end-tag for this
element is optional.  Within this element, the following tags may
be used: \fBcite, eq, b, c, i, r, q, emq\fP.
.RT
.IP "\fBr\fP" 1i
Roman.  Both start-tag and end-tag are required.
.RT
.IP "\fBrauthors\fP" 1i
The authors list in a reference.
The end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBrefs\fP" 1i
The references section.  Both start-tag and end-tag are
required.  Within this element, the following tags may be used:
\fBaref, bref\fP.
.RT
.IP "\fBrelation\fP" 1i
The relationship to other documents section.  The
end-tag for this element is optional.  Within this element, the
following tags may be used: \fBobs, upd\fP.
.RT
.IP "\fBrow\fP" 1i
A row of a table.  The end-tag for this element is
optional.  Within this element, the following tags may be used:
\fBcol\fP.
.RT
.IP "\fBsauthor\fP" 1i
The short list of authors' names, for use in the
page footer.  The end-tag for this element is optional.  No other tags
are valid within this element.
.RT
.IP "\fBsecurity\fP" 1i
The security considerations section of the
document.  Both start-tag and end-tag are required.  Within this
element, the following tags may be used: \fBp, qp, tp, list\fP.
.RT
.IP "\fBsline\fP" 1i
A single horizontal line in a table.  The end-tag for
this element is optional.  No other tags are valid within this element.
.RT
.IP "\fBstatus\fP" 1i
The document status section.  Both start-tag and
end-tag are required.  Within this element, the following tags may be
used: \fBp, qp, tp, list\fP.
.RT
.IP "\fBstitle\fP" 1i
The short title, for use in the page header.  The
end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBstreet\fP" 1i
The street part of an address.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBtbl\fP" 1i
A table.  Both start-tag and end-tag are required.
Within this element, the following tags may be used: \fBttitle, thead,
row, dline, sline\fP.  This element has a required attribute, the
column format specification.  This is a string of the letters \fBc\fP,
\fBl\fP, and \fBr\fP, denoting whether each column should be
centered, left adjusted, or right adjusted.
.RT
.IP "\fBthead\fP" 1i
The table column headings.  The end-tag for this
element is optional.  Within this element, the following tags may be
used: \fBcol\fP.
.RT
.IP "\fBtitle\fP" 1i
The title of the document.  The end-tag for this
element is optional.  No other tags are valid within this element.
.RT
.IP "\fBtp\fP" 1i
A tagged paragraph.  The end-tag for this element is
optional.  Within this element, the following tags may be used:
\fBcite, eq, b, c, i, r, q, emq\fP.  This element has an optional
attribute, the content of the paragraph tag.
.RT
.IP "\fBttitle\fP" 1i
A table title.  The end-tag for this element is
optional.  No other tags are valid within this element.
.RT
.IP "\fBupd\fP" 1i
A list of documents updated by this document.
The end-tag for this element is optional.  No other tags are valid
within this element.
.RT
.IP "\fBvol\fP" 1i
The volume number in a reference.
The end-tag for this element is optional.  No other tags are valid within
this element.
.RT
.IP "\fBwkgroup\fP" 1i
The name of the working group that produced the
document.  The end-tag for this element is optional.  No other tags
are valid within this element.
.RT
.NH 3
Missing Components
.PP
The major components missing from the INETDOC1 DTD are:
.RS
.LE
Complex tables.  The DTD provides for simple tables in which a
regular grid of cells is used.  However, no provision is made for
cells that span multiple columns or rows, multiple-line text in a
single cell, and so on.  Tables will require a great deal of thought
when developing a standard DTD; the AAP material [AAP87c]
may be useful here.
.LE
Equations.  No provision is made for equations (other than the
definition of a couple of tags).  Equations will also require a great deal
of thought when developing a standard DTD; the AAP material
[AAP87b] may be useful here.
.LE
Figures.  The INETDOC1 DTD provides for simple figures by
basically defining an element in which the provided text is simply
uninterpreted.  More work will be necessary in the standard DTD, to
provide some level of drawing ability (at least lines, boxes, circles,
and so on).
.LE
There are no markup shortcuts defined.  SGML allows shortcuts to
be defined for different parts of the document markup (for example, a
blank line could represent a paragraph start-tag).  These were not
defined in INETDOC1, more for the sake of simplicity than anything
else.
.RE
.RT
.PP
Additionally, there are a few parts of the INETDOC1 DTD that have
been \*Qslapped together\*U for the sake of expediency, even though
they may not be the \*Qright thing\*U in the long run:
.RS
.LE
The references formats are somewhat lacking in robustness.  A
simple format for books and articles has been defined, but formats for
other things such as RFCs, personal communications, and so on would
need to be defined as well.
.LE
The definition of the table format is not general enough.  The
\fBtbl\fP tag requires that the user provide an attribute giving the
structure of the table.  The format of this attribute is suitable for
use with \fCtroff\fP and \fCTeX\fP, but is not generally usable.
.LE
The definition of elements for bold face, constant width, and
italics go somewhat against the idea of generalized markup, since they
specify \*Qvisual\*U aspects of a document, not \*Qcontent\*U
aspects.  A better way to go would probably be to define several
levels of emphasis, with tags such as \fBe1\fP, \fBe2\fP, and so on.
It would then be up to the particular translator to define fonts or
other typographic techniques for these.
.LE
Likewise, the definition of section headings is arguably wrong.
One could argue for something like a section element, which consists
of a section title and section body, and let the translator decide how
to represent this.  This is sort of done in the parts of the INETDOC1
DTD that precede the \fBbody\fP element.
.RE
.RT
.PP
However, even with the above problems, the INETDOC1 DTD is
sufficient for handling the majority of the current Internet
documents.  As proof of this, this Internet Draft was written using
INETDOC1.
.NH 2
The Amsterdam SGML Parser
.PP
The second part of the sample implementation is the Amsterdam SGML Parser
(ASP) [Warmer89], developed by Jos Warmer and Sylvia van
Egmond of the Vrije Universiteit.  As they describe it,
.QP
The Amsterdam SGML Parser uses an LL(1) parser generator, notably LLgen,
for both DTD and document parsing.  Actually, SGML is not LL(1),
[so the developers] used the \*Qconflict resolvers\*U from LLgen to overcome
the problems [they] came across.
.PP
Basically, one gives the ASP a DTD to process, and it generates
C-language code that implements a parser for documents written to that
DTD.  This parser is then compiled, and can be used to verify the
format of these documents.  The output of the parser is a document
with all optional markup inserted in its proper place, and all entity
references resolved.
.PP
Another feature of the ASP-generated parser is that of replacement
files.  Using a simple format, one can specify to the parser that for
each start-tag or end-tag encountered in the document, it should be
replaced with arbitrary text.  This makes it quite easy to, for
example, translate SGML markup into another markup language such as
\fCtroff\fP or \fCTeX\fP.  A sample replacement file to convert the
INETDOC1 DTD to \fCtroff -ms\fP is included in Appendix B.
.NH 3
Limitations
.PP
Unfortunately, the replacement file facility provided by the ASP is
not powerful enough to allow translation of a complete DTD into
arbitrary formats.  The primary problem is that only start-tags and
end-tags may be translated.  There is no provision to handle entities
such as special characters, which are represented differently in
different formats.
.PP
There is also little control over the position of items in the document.
Because SGML does not require that tags begin on new lines or end with
new lines, they may appear anywhere in a document.  The replacement
file allows one to replace a tag with other text, but does not provide
any ability to say \*Qthe stuff in this element must follow this
replacement text with no line break, regardless of how the user typed
it.\*U As can be seen in Appendix B, this can require some rather
interesting constructs in the replacement file to make things work
properly.
.NH 2
Obtaining the Software
.PP
The sample implementation has been made available to any interested
party for anonymous FTP from the host \fCharbor.ecn.purdue.edu\fP in
the file \fIpub/inetdoc.tar.Z\fP.  This is a compressed UNIX
\fCtar\fP file which must be transferred in \*Qbinary\*U mode.  The
software has been tested on a Sun SPARC system running SunOS 4.1.1; it
should work on most other UNIX systems with little difficulty.
(Please do not contact the author about portability problems, as he
did not develop this code.)
.PP
The contents of the distribution are as follows:
.RS
.LE
The source code for the ASP-generated parser that will parse a
document conforming to INETDOC1.
.LE
The file INETDOC1.DTD, which is the document type definition for
Internet documents.
.LE
The file INETDOC1.MS.REP, which is the replacement file for ASP
to translate the SGML used in INETDOC1 into \fCtroff -ms\fP.
.LE
A bibliography of SGML information and resources.
.LE
Sample document type definitions from other projects.
.LE
The SGML version of this Internet Draft.
.LE
The \fCtroff -ms\fP version of this Internet Draft.
.LE
A PostScript version of this Internet Draft (created with
\fCtroff\fP).
.LE
An ASCII (with overstrikes and underlining) version of this
Internet Draft (created with \fCnroff\fP).
.RE
.RT
.PP
If you are interested in the use of SGML to format Internet
documents, please obtain a copy of this software and experiment with
it.  If there appears to be enough interest, it may be worthwhile to
start a discussion group for the topic.  Please \fBdo not\fP send
comments to the IETF mailing lists; they are already overloaded.
.NH 1
Conclusions
.PP
The use of SGML to format Internet documents would provide several
advantages to the Internet community.  Although the work required to
make such a change is not trivial, it is feasible.  The use of SGML has
already started to spread in other disciplines, and it would be
beneficial for the Internet community to take advantage of this work
before it passes us by.
.SH
Appendix A - INETDOC1.DTD
.PP
This is the complete listing of INETDOC1.DTD as used to format this
Internet Draft.
.LD
.ps -2
.vs -2
\fC
<!-- =====================================================================
==== INETDOC1.DTD
====
==== Version 0.1, June 1992
====
==== This is a Document Type Definition (DTD) for Internet documents,
==== written in the Standard Generalized Markup Language (SGML) as
==== defined by ISO 8879-1986.  It is intended to describe the format
==== of the Request for Comments (RFC) and Internet Draft publications.
====
==== This particular DTD is designed for use with the Amsterdam SGML
==== Parser (ASP), although it should work with any ISO 8879-conformant
==== SGML system.
====
==== Items remaining to be implemented:
====            - In-line equations
====            - Display equations
====            - Special characters
====
==== This DTD, the ASP, and the ASP replacement files included with this
==== distribution constitute an experiment in encoding RFCs and Internet
==== Drafts in SGML for distribution, and then translating them into other
==== markup languages (NROFF/TROFF, TeX, etc.) for final formatting and
==== printing by the end user.
====
==== If you have questions about the experiment, or about this DTD,
==== contact:
====
==== David A. Curry
==== Purdue University
==== Engineering Computer Network
==== 1285 Electrical Engineering Building
==== West Lafayette, IN 47907-1285
====
==== Phone:  (317) 494-3561
==== Fax:    (317) 494-6440
==== E-Mail: davy@ecn.purdue.edu
====
========================================================================== -->

<!-- =====================================================================
==== Internet document type declaration.  Typical invocation:
====
==== <!DOCTYPE inetdoc1 PUBLIC "-//IAB//DTD Internet Document Type 1//EN">
====
==== This line goes at the very top of your document, before you start
==== using any SGML tags.  If your SGML parser does not have a standard
==== place in which to look for DTDs, then try this invocation line
==== instead:
====
==== <!DOCTYPE inetdoc1 PUBLIC "-//IAB//DTD Internet Document Type 1//EN"
====                    SYSTEM "put the pathname to inetdoc1.dtd here">
====
========================================================================== -->
<!DOCTYPE inetdoc1
 [

  <!-- ===================================================================
  ==== A type-1 internet document is either an RFC or an Internet Draft.
  ======================================================================== -->
  <!ELEMENT inetdoc1    - -     (rfc | indraft)                              >

  <!-- ===================================================================
  ==== Entities used later; allows us to change things in one place
  ==== instead of several.
  ======================================================================== -->
  <!-- document elements                                                   -->
  <!ENTITY % e.doc      "wkgroup, authlist, sauthor, date, relation?,
                         title, stitle, status, dist, abstract, body,
                         acks?, refs?, security, authaddr"                   >

  <!-- header elements                                                     -->
  <!ENTITY % e.hdr      "h0 | h1 | h2 | h3 | h4 | h5 | h6 | h7 | h8 | h9"    >

  <!-- emphasis elements                                                   -->
  <!ENTITY % e.emph     "b | c | i | r"                                      >

  <!-- quoting elements                                                    -->
  <!ENTITY % e.quote    "emq | q"                                            >

  <!-- paragraph elements                                                  -->
  <!ENTITY % e.para     "#PCDATA | cite | eq | %e.emph; | %e.quote;"         >

  <!-- paragraph types                                                     -->
  <!ENTITY % e.ptype    "p | qp | tp | list"                                 >

  <!-- null string                                                         -->
  <!ENTITY null         CDATA           ""                                   >

  <!-- ===================================================================
  ==== Special characters.
  ======================================================================== -->
  <!ENTITY lt           CDATA           "<"                                  >
  <!ENTITY amp          CDATA           "&"                                  >

  <!-- ===================================================================
  ==== Definition of an RFC and an Internet Draft.  They're essentially
  ==== the same thing, except that an RFC has a required parameter in the
  ==== tag, the RFC number.
  ======================================================================== -->
  <!ELEMENT indraft     - -     (%e.doc;)                                    >

  <!ELEMENT rfc         - -     (%e.doc;)                                    >

  <!-- An RFC must have a number associated with it                        -->
  <!ATTLIST rfc         number          NUMBER          #REQUIRED            >

  <!-- ===================================================================
  ==== The major elements of an RFC or Internet Draft.  Some of these get
  ==== broken down even further.
  ======================================================================== -->
  <!ELEMENT abstract    - -     (%e.ptype;)+
        -- the abstract of the document (required) --                        >

  <!ELEMENT acks        - -     (%e.ptype;)+
        -- the acknowledgements section (optional) --                        >

  <!ELEMENT authaddr    - -     (address)+
        -- the authors' addresses section (required) --                      >

  <!ELEMENT authlist    - -     (authname, authinst)+
        -- the list of authors and institutions (required) --                >

  <!ELEMENT body        - -     (%e.hdr; | %e.ptype; | deq | fig | tbl)+
        -- the body of the document (required) --                            >

  <!ELEMENT date        - O     (#PCDATA)
        -- the document issue date (required) --                             >

  <!ELEMENT dist        - O     (#PCDATA)
        -- the distribution statement (required) --                          >

  <!ELEMENT refs        - -     (aref | bref)+
        -- the references section (optional) --                              >

  <!ELEMENT relation    - O     (obs | upd)
        -- the relation to other documents line (optional) --                >

  <!ELEMENT sauthor     - O     (#PCDATA)
        -- the short author names (required) --                              >

  <!ELEMENT security    - -     (%e.ptype;)+
        -- the security considerations section (required) --                 >

  <!ELEMENT status      - -     (%e.ptype;)+
        -- the status statement (required) --                                >

  <!ELEMENT stitle      - O     (#PCDATA)
        -- the short title of the document (required) --                     >

  <!ELEMENT title       - O     (#PCDATA)
        -- the title of the document (required) --                           >

  <!ELEMENT wkgroup     - O     (#PCDATA)
        -- the working group name (required) --                              >

  <!-- The person needs to tell us how many authors there are, if there is
       more than one.                                                      -->
  <!ATTLIST authaddr    nauthors        NUMBER          "1"                  >
  <!ATTLIST authlist    nauthors        NUMBER          "1"                  >

  <!-- ===================================================================
  ==== Body elements.
  ======================================================================== -->
  <!ELEMENT p           - O     (%e.para;)+
        -- a regular paragraph --                                            >
  <!ELEMENT qp          - O     (%e.para;)+
        -- block quoted material --                                          >
  <!ELEMENT tp          - O     (%e.para;)+
        -- a tagged paragraph --                                             >

  <!ELEMENT list        - -     (item)+
        -- an itemized list --                                               >
  <!ELEMENT item        - O     (%e.para;)+
        -- a list item --                                                    >

  <!ELEMENT h0          - O     (#PCDATA)
        -- level 0 heading --                                                >
  <!ELEMENT h1          - O     (#PCDATA)
        -- level 1 heading --                                                >
  <!ELEMENT h2          - O     (#PCDATA)
        -- level 2 heading --                                                >
  <!ELEMENT h3          - O     (#PCDATA)
        -- level 3 heading --                                                >
  <!ELEMENT h4          - O     (#PCDATA)
        -- level 4 heading --                                                >
  <!ELEMENT h5          - O     (#PCDATA)
        -- level 5 heading --                                                >
  <!ELEMENT h6          - O     (#PCDATA)
        -- level 6 heading --                                                >
  <!ELEMENT h7          - O     (#PCDATA)
        -- level 7 heading --                                                >
  <!ELEMENT h8          - O     (#PCDATA)
        -- level 8 heading --                                                >
  <!ELEMENT h9          - O     (#PCDATA)
        -- level 9 heading --                                                >

  <!-- Tagged paragraphs can be given a tag.                               -->
  <!ATTLIST tp          tag             CDATA           ""                   >

  <!-- ===================================================================
  ==== Address elements.
  ======================================================================== -->
  <!ELEMENT address     - O     (name, org*, street+, city, csub, pcode,
                                 country?, phone, fax?, email)
        -- an address specification --                                       >

  <!ELEMENT city        - O     (#PCDATA)
        -- city --                                                           >
  <!ELEMENT country     - O     (#PCDATA)
        -- country --                                                        >
  <!ELEMENT csub        - O     (#PCDATA)
        -- country subdivision --                                            >
  <!ELEMENT email       - O     (#PCDATA)
        -- electronic mail address --                                        >
  <!ELEMENT fax         - O     (#PCDATA)
        -- fax number --                                                     >
  <!ELEMENT name        - O     (#PCDATA)
        -- name --                                                           >
  <!ELEMENT org         - O     (#PCDATA)
        -- organization --                                                   >
  <!ELEMENT pcode       - O     (#PCDATA)
        -- postal code --                                                    >
  <!ELEMENT phone       - O     (#PCDATA)
        -- phone number --                                                   >
  <!ELEMENT street      - O     (#PCDATA)
        -- street address --                                                 >

  <!-- ===================================================================
  ==== Emphasis.
  ======================================================================== -->
  <!ELEMENT b           - -     (#PCDATA)
        -- bold face --                                                      >
  <!ELEMENT c           - -     (#PCDATA)
        -- constant width --                                                 >
  <!ELEMENT i           - -     (#PCDATA)
        -- italics --                                                        >
  <!ELEMENT r           - -     (#PCDATA)
        -- roman --                                                          >

  <!-- ===================================================================
  ==== Quotes.
  ======================================================================== -->
  <!ELEMENT q           - -     (#PCDATA | emq | %e.emph;)*
        -- in-line quoted material --                                        >
  <!ELEMENT emq         - -     (#PCDATA | %e.emph;)*
        -- embedded quoted material --                                       >

  <!-- ===================================================================
  ==== References.
  ======================================================================== -->
  <!ELEMENT cite        - -     (#PCDATA)
        -- reference citation --                                             >
  <!ELEMENT aref        - O     (cite, rauthors, atitle, ptitle, vol,
                                 pdate, pages)
        -- an article reference --                                           >
  <!ELEMENT bref        - O     (cite, rauthors, btitle, pubcity, pubname,
                                 pubdate)
        -- a book reference --                                               >
  <!ELEMENT rauthors    - O     (#PCDATA)
        -- reference author names --                                         >
  <!ELEMENT btitle      - O     (#PCDATA)
        -- book title --                                                     >
  <!ELEMENT pubcity     - O     (#PCDATA)
        -- city of publication --                                            >
  <!ELEMENT pubname     - O     (#PCDATA)
        -- publishers name --                                                >
  <!ELEMENT pubdate     - O     (#PCDATA)
        -- publication date --                                               >
  <!ELEMENT atitle      - O     (#PCDATA)
        -- article title --                                                  >
  <!ELEMENT ptitle      - O     (#PCDATA)
        -- periodical title --                                               >
  <!ELEMENT vol         - O     (#PCDATA)
        -- volume number --                                                  >
  <!ELEMENT pdate       - O     (#PCDATA)
        -- periodical date --                                                >
  <!ELEMENT pages       - O     (#PCDATA)
        -- page numbers --                                                   >

  <!-- get rid of newlines in references; they're too hard to deal with.   -->
  <!SHORTREF    refmap          "&#RS;"         null
                                "&#RE;"         null                         >
  <!USEMAP      refmap  aref                                                 >
  <!USEMAP      refmap  bref                                                 >

  <!-- ===================================================================
  ==== Figures.
  ======================================================================== -->
  <!ELEMENT fig         - -     (ftitle?, fdata)
        -- a figure --                                                       >
  <!ELEMENT ftitle      - O     (#PCDATA)
        -- the figure title --                                               >
  <!ELEMENT fdata       - O     (#PCDATA | %e.emph; | %e.quote;)*
        -- the figure data --                                                >

  <!-- ===================================================================
  ==== Tables.
  ======================================================================== -->
  <!ELEMENT tbl         - -     (ttitle, thead, (row | dline | sline)+)
        -- a table --                                                        >
  <!ELEMENT ttitle      - O     (#PCDATA)
        -- table title --                                                    >
  <!ELEMENT thead       - O     (col)+
        -- table headings --                                                 >
  <!ELEMENT row         - O     (col)+
        -- a table row --                                                    >
  <!ELEMENT col         - O     (#PCDATA | %e.emph; | %e.quote;)*
        -- a table column --                                                 >
  <!ELEMENT dline       - O     (#PCDATA)
        -- a double horizontal line --                                       >
  <!ELEMENT sline       - O     (#PCDATA)
        -- a single horizontal line --                                       >

  <!-- get rid of newlines in the table; they're too hard to deal with.    -->
  <!SHORTREF    tblmap          "&#RS;"         null
                                "&#RE;"         null                         >
  <!USEMAP      tblmap  tbl                                                  >

  <!-- Tables must have a column format specification.                     -->
  <!ATTLIST tbl         colfmt          CDATA           #REQUIRED            >

  <!-- ===================================================================
  ==== Equations (not implemented yet).
  ======================================================================== -->
  <!ELEMENT deq         - -     (#PCDATA)
        -- a display equation --                                             >
  <!ELEMENT eq          - -     (#PCDATA)
        -- an in-line equation --                                            >

  <!-- ===================================================================
  ==== Miscellaneous elements.
  ======================================================================== -->
  <!ELEMENT authinst    - O     (#PCDATA)
        -- the author's institution --                                       >
  <!ELEMENT authname    - O     (#PCDATA)
        -- the author's name --                                              >
  <!ELEMENT obs         - O     (#PCDATA)
        -- the list of obsoleted documents --                                >
  <!ELEMENT upd         - O     (#PCDATA)
        -- the list of updated documents --                                  >

<!-- =====================================================================
==== End of Document Type Definition for "inetdoc1".
========================================================================== -->
 ]
>\fP
.ps
.vs
.if !'\*(h5'' \{\
.ce
\fB\*(h5\fP
.rm h5
.\}
.DE
.SH
Appendix B - INETDOC1.MS.REP
.PP
This is the complete text of INETDOC1.MS.REP, the ASP replacement
file to translate the SGML of INETDOC1.DTD into \fCtroff -ms\fP.
.LD
.ps -2
.vs -2
\fC
%
% INETDOC1.MS.REP
%
% Version 0.1, June 1992
%
% This is an ASP replacement file for use with the Amsterdam SGML Parser.
% It translates the INETDOC1 DTD SGML tags into "troff -ms" macros.
%
% Each command in this file is of the format:
%
%       tag  [plus]  repl  [plus]
%
% "tag" is the tag to match on.  If there is a match, the tag is replaced
% by the material in "repl", which is a series of quoted strings.  Normal
% C backslash escapes are valid in the repl part.  Within repl, [attrname]
% will be replaced with the attribute name "attrname".  If a plus sign (+)
% precedes repl, then repl will be guaranteed to start on a new line.  If
% a plus sign follows repl, then repl will be guaranteed to end with a new
% line.
%
% This replacement file, the ASP, and the DTD included with this distribution
% constitute an experiment in encoding RFCs and Internet Drafts in SGML for
% distribution, and then translating them into other markup languages
% (NROFF/TROFF, TeX, etc.) for final formatting and printing by the end user.
%
% If you have questions about the experiment, or about this replacement file,
% contact:
%
% David A. Curry
% Purdue University
% Engineering Computer Network
% 1285 Electrical Engineering Building
% West Lafayette, IN 47907-1285
%
% Phone:  (317) 494-3561
% Fax:    (317) 494-6440
% E-Mail: davy@ecn.purdue.edu
%

%
% Put out initial document setup.
%
<INETDOC1>      +       ".nr LL 6.5i\\n"
                        ".nr PO 1.0i\\n"
                        ".nr PS 12\\n"
                        ".nr VS 14\\n"
                        ".ll 6.5i\\n"
                        ".po 1.0i\\n"
                        ".ps 12\\n"
                        ".vs 14\\n"
                        ".ta 6.5iR\\n"
                        ".ds RF \\[Page \\%]\\n"
                        ".rm CF\\n"
                        ".hy 0\\n"
                        ".de LE\\n"
                        ".RT\\n"
                        ".nr Ct 1\\n"
                        ".nr 2T 0\\n"
                        ".ne 1.1\\n"
                        ".if !\\\\\\\\n(IP .nr IP +1\\n"
                        ".in +\\\\\\\\n(I\\\\\\\\n(IRu\\n"
                        ".ta \\\\\\\\n(I\\\\\\\\n(IRu\\n"
                        ".ti -\\\\\\\\n(I\\\\\\\\n(IRu\\n"
                        "\\\\(bu\\\\t\\\\c\\n"
                        ".."                                            +

%
% Define strings for headers.
%
<INDRAFT>       +       ".ds h1 Internet Draft\\n"
                        ".ds LH Internet Draft"                         +
<RFC>           +       ".ds h1 Request for Comments: [number]\\n"
                        ".ds LH RFC [number]"                           +
<WKGROUP>       +       ".ds h0 \\\\"                                     +
</WKGROUP>              ""                                              +
<SAUTHOR>       +       ".ds LF \\\\"                                     +
</SAUTHOR>              ""                                              +
<DATE>          +       ".ds RH \\\\"                                     +
</DATE>                 ""                                              +
<OBS>           +       ".ds h9 Obsoletes: \\\\"                          +
</OBS>                  ""                                              +
<UPD>           +       ".ds h9 Updates: \\\\"                            +
</UPD>                  ""                                              +

%
% This grotesqueness is to handle multiple authors.  The first author
% goes into h3 and h4 and is handled "normally".  The rest of them
% go into a diversion macro, h8.
%
<AUTHNAME>      +       ".ds h4 \\\\"                                     +
</AUTHNAME>     +       ".ie '\\\\*(h2'' \\\\{\\\\\\n"
                        ".ds h2 \\\\*(h4\\n"
                        ".\\\\}\\n"
                        ".el \\\\{\\\\\\n"
                        ".da h8\\n"
                        ".br\\n"
                        "\\t\\\\*(h4\\n"
                        ".br\\n"
                        ".da\\n"
                        ".\\\\}"                                          +
<AUTHINST>      +       ".ds h5 \\\\"                                     +
</AUTHINST>     +       ".ie '\\\\*(h3'' \\\\{\\\\\\n"
                        ".ds h3 \\\\*(h5\\n"
                        ".\\\\}\\n"
                        ".el \\\\{\\\\\\n"
                        ".da h8\\n"
                        ".br\\n"
                        "\\t\\\\*(h5\\n"
                        ".br\\n"
                        ".da\\n"
                        ".\\\\}"                                          +

%
% Set up titles.  At the time we receive the title, we can output the
% top heading stuff.
%
<TITLE>         +       ".nf\\n"
                        "\\\\*(h0\\t\\\\*(h2\\n"
                        "\\\\*(h1\\t\\\\*(h3\\n"
                        ".h8\\n"
                        "\\t\\\\*(RH\\n"
                        ".fi\\n"
                        ".sp\\n"
                        "\\\\*(h9\\n"
                        ".TL"                                           +
</TITLE>                ""                                              +
<STITLE>        +       ".ds CH \\\\"                                     +
</STITLE>               ""                                              +

%
% Sections
%
<STATUS>        +       ".SH\\n"
                        "Status of this Memo"                           +
<DIST>          +       ".SH\\n"
                        "Distribution\\n"
                        ".PP"                                           +
<ABSTRACT>      +       ".SH\\n"
                        "Abstract"                                      +
</ABSTRACT>     +       ".bp"                                           +
<ACKS>          +       ".SH\\n"
                        "Acknowledgements"                              +
<REFS>          +       ".SH\\n"
                        "References"                                    +
<SECURITY>      +       ".SH\\n"
                        "Security Considerations"                       +
<AUTHADDR>      +       ".ie [nauthors]>1 \\\\{\\\\\\n"
                        ".SH\\n"
                        "Authors' Addresses\\n"
                        ".\\\\}\\n"
                        ".el \\\\{\\\\\\n"
                        ".SH\\n"
                        "Author's Address\\n"
                        ".\\\\}"                                          +

%
% Headings
%
<H0>            +       ".SH"                                           +
<H1>            +       ".NH 1"                                         +
<H2>            +       ".NH 2"                                         +
<H3>            +       ".NH 3"                                         +
<H4>            +       ".NH 4"                                         +
<H5>            +       ".NH 5"                                         +
<H6>            +       ".NH 6"                                         +
<H7>            +       ".NH 7"                                         +
<H8>            +       ".NH 8"                                         +
<H9>            +       ".NH 9"                                         +

%
% Paragraphs.
%
<P>             +       ".PP"                                           +
<QP>            +       ".QP"                                           +
<TP>            +       ".IP \\"\\\\fB[tag]\\\\fP\\" 1i"                      +
</TP>           +       ".RT"                                           +
<ITEM>          +       ".LE"                                           +
<LIST>          +       ".RS"                                           +
</LIST>         +       ".RE\\n"
                        ".RT"                                           +

%
% Addresses.
%
<ADDRESS>       +       ".RT\\n"
                        ".sp"                                           +
</NAME>         +       ".br"                                           +
</ORG>          +       ".br"                                           +
</STREET>       +       ".br"                                           +
% Kludge to get a comma in there
<CITY>          +       ".ds h6 \\\\"                                     +
<CSUB>          +       ".ds h7 \\\\"                                     +
</CSUB>         +       "\\\\*(h6, \\\\*(h7"                                +
</PCODE>        +       ".br"                                           +
</COUNTRY>      +       ".br"                                           +
<PHONE>         +       ".sp .5v\\n"
                        "Phone: \\\\"                                     +
<FAX>           +       ".br\\n"
                        "Fax: \\\\"                                       +
<EMAIL>         +       ".br\\n"
                        "E-Mail: \\\\"                                    +

%
% Fonts and quotes.
%
<B>                     "\\\\fB"
<C>                     "\\\\fC"
<I>                     "\\\\fI"
<Q>                     "\\\\*Q"
<R>                     "\\\\fR"
<EMQ>                   "`"
</B>                    "\\\\fP"
</C>                    "\\\\fP"
</I>                    "\\\\fP"
</Q>                    "\\\\*U"
</R>                    "\\\\fP"
</EMQ>                  "'"

%
% References.
%
<AREF>          +       ".XP"                                           +
<BREF>          +       ".XP"                                           +
<CITE>                  "\\["
</CITE>                 "]"
<RAUTHORS>      +       ""
</RAUTHORS>             "."                                             +
<BTITLE>        +       "\\\\fI"
</BTITLE>               "\\\\fP."                                         +
<PUBCITY>       +       ""
</PUBCITY>              ":"                                             +
<PUBNAME>       +       ""
</PUBNAME>              "."                                             +
<PUBDATE>       +       ""
</PUBDATE>              "."                                             +
<ATITLE>        +       "\\\\*Q"
</ATITLE>               ".\\\\*U"                                         +
<PTITLE>        +       "\\\\fI"
</PTITLE>               "\\\\fP."                                         +
<VOL>           +       ""
</VOL>                  ""                                              +
<PDATE>         +       "("
</PDATE>                "):"                                            +
<PAGES>         +       ""
</PAGES>                "."                                             +

%
% Tables.
%
<TBL>           +       ".TS H\\n"
                        "center, box, tab(&);\\n"
                        "[colfmt]."                                     +
</TBL>          +       ".TE\\n"
                        ".ce\\n"
                        "\\\\fB\\\\*(h5\\\\fP\\n"
                        ".rm h5"                                        +
<TTITLE>        +       ".ds h5 \\\\"                                     +
</TTITLE>               ""                                              +
<THEAD>         +       ""
</THEAD>        +       ".TH"                                           +
<ROW>           +       ""
</COL>                  "&"
<DLINE>         +       "="                                             +
<SLINE>         +       "_"                                             +

%
% Figures.
%
<FIG>           +       ".LD\\n"
                        ".ps -2\\n"
                        ".vs -2"                                        +
</FIG>          +       ".ps\\n"
                        ".vs\\n"
                        ".if !'\\\\*(h5'' \\\\{\\\\\\n"
                        ".ce\\n"
                        "\\\\fB\\\\*(h5\\\\fP\\n"
                        ".rm h5\\n"
                        ".\\\\}\\n"
                        ".DE"                                           +
<FTITLE>        +       ".ds h5 \\\\"                                     +
</FTITLE>               ""                                              +

%
% Equations (not implemented yet).
%
</DEQ>
</EQ>
<DEQ>
<EQ>

%
% We don't do anything with these; they'll just be replaced by "nothing".
% We list them here for completeness, so that we can tell whether an item
% has been left out accidentally.
%
</ACKS>
</ADDRESS>
</AREF>
</AUTHADDR>
</AUTHLIST>
</BODY>
</BREF>
</CITY>
</DIST>
</DLINE>
</EMAIL>
</FAX>
</FDATA>
</H0>
</H1>
</H2>
</H3>
</H4>
</H5>
</H6>
</H7>
</H8>
</H9>
</INDRAFT>
</INETDOC1>
</ITEM>
</P>
</PHONE>
</QP>
</REFS>
</RELATION>
</RFC>
</ROW>
</SECURITY>
</SLINE>
</STATUS>
<AUTHLIST>
<BODY>
<COL>
<COUNTRY>
<FDATA>
<NAME>
<ORG>
<PCODE>
<RELATION>
<STREET>\fP
.ps
.vs
.if !'\*(h5'' \{\
.ce
\fB\*(h5\fP
.rm h5
.\}
.DE
.SH
Acknowledgements
.PP
My thanks to Vint Cerf and John Klensin, who provided useful
comments and encouragement during the development of the sample
implementation described in this document.  Thanks also to Joyce
Reynolds, who answered my numerous stupid questions.  Lastly, thanks
to Robin Cover for producing his excellent \fIStandard Generalized
Markup Language Annotated Bibliography and List of Resources\fP
(included in the sample implementation distribution), which enabled
me to find enough software and documentation to develop an idea into
this proposal.
.PP
No thanks whatsoever to the Purdue University library system, which
does not possess a single book on the use of SGML (sigh).
.SH
References
.XP
[AAP87a]
Association of American Publishers.
\fIElectronic Manuscript Project: Standard for Electronic Manuscript Preparation and Markup\fP.
Washington, D.C.:
Association of American Publishers.
August 1987.
.XP
[AAP87b]
Association of American Publishers.
\fIElectronic Manuscript Project: Markup of Mathematical Formulas\fP.
Washington, D.C.:
Association of American Publishers.
August 1987.
.XP
[AAP87c]
Association of American Publishers.
\fIElectronic Manuscript Project: Markup of Tabular Material\fP.
Washington, D.C.:
Association of American Publishers.
August 1987.
.XP
[ANSI88]
National Information Standards Organization.
\fIElectronic Manuscript Preparation and Markup, ANSI/NISO Z39.59-1988\fP.
New Brunswick, NJ:
Transaction Publishers.
1991.
.XP
[Postel89]
Postel, J..
\fIInstructions to RFC Authors\fP.
RFC 1111:
DDN Network Information Center.
August 1989.
.XP
[Warmer89]
Warmer, Jos and Sylvia van Egmond.
\*QThe Implementation of the Amsterdam SGML Parser.\*U
\fIElectronic Publishing: Origination, Dissemination and Design\fP.
2/2
(July 1989):
65-90.
.SH
Security Considerations
.PP
Security is not discussed in this document.
.ie 1>1 \{\
.SH
Authors' Addresses
.\}
.el \{\
.SH
Author's Address
.\}
.RT
.sp
David A. Curry
.br
Purdue University
.br
Engineering Computer Network
.br
1285 Electrical Engineering Building
.br
.ds h6 \
West Lafayette
.ds h7 \
Indiana
\*(h6, \*(h7
47907-1285
.br
U.S.A.
.br
.sp .5v
Phone: \
(317) 494-3561
.br
Fax: \
(317) 494-6440
.br
E-Mail: \
davy@ecn.purdue.edu

