Uniform Resource Locators              Tim Berners-Lee
INTERNET DRAFT                                    CERN
IETF URL Working Group                    14 July 1993



                          Uniform Resource Locators


     Status of this memo

            This document is an Internet Draft.  Internet Drafts are
          working documents of the Internet Engineering Task Force
          (IETF), its Areas, and its Working Groups.  Note that other
          groups may also distribute working documents as Internet
          Drafts.
            Internet Drafts are working documents valid for a maximum of
          six months.  Internet Drafts may be updated, replaced, or
          obsoleted by other documents at any time.  It is not
          appropriate to use Internet Drafts as reference material or to
          cite them other than as a "working draft" or "work in
          progress".
            Distribution of this document is unlimited.  Please send
          comments to the author as timbl@info.cern.ch. or to the
          discussion list  ietf-url@merit.edu.

     Abstract

            Many protocols and systems for document search and retrieval
          are currently in use, and many more protocols or refinements
          of existing protocols are to be expected in a field whose
          expansion is explosive.
            These systems are aiming to achieve global search and
          readership of documents across differing computing platforms,
          and despite a plethora of protocols and data formats.   As
          protocols evolve, gateways can allow global access to remain
          possible. As data formats evolve, format conversion programs
          can preserve global access.  There is one area, however, in
          which it is impractical to make conversions, and that is in
          the names and addresses used to identify objects.  This is
          because names and addresses of objects are passed on in so
          many ways, from the backs of envelopes to hypertext objects,
          and may have a long life.
            This paper discusses the requirements on a universal syntax
          which can be used to refer to objects available using existing
          protocols, and may be extended with technology.  It makes a
          recommendation for a generic syntax, and for specific forms
          for "Uniform Resource Locators" (URLs)of objects accessible
          using existing Internet protocols.

Uniform Resource Locators                        Berners-Lee



     Terms

            The objects on the network which are to be named and
          addressed include typically objects which can be retrieved,
          and objects which can be searched.   There is a great variety
          of other objects which may support other operations.  We imply
          nothing about the contents of objects in this document.
          Whereas human-readable documents are currently the center of
          interest of the field, we envisage all aspects discussed in
          this paper applying to generalised objects when systems to
          handle them become available. The "object" is the unit of
          reference and need not correspond to any unit of storage.  We
          refer to objects which can be searched as "indexes".  We
          emphasise that this is the abstract view of the client, and
          these objects need not correspond to physical files on
          computers. We refer to the person who does the retrieval or
          searching as the user.
            Within this document, we use the terms "name" very generally
          for a string of characters describing an object,  whatever its
          combination of properties mentioned below.  (The term usually
          has a narrower meaning but we needed some term for the
          universal set).  The term "address" is reserved for an string
          which specifies a more or less physical location.  The term
          "locator" refers to a URL as here defined.

     Requirements

            This section discusses requirements for URLs, as an
          introduction of and background for the Recommendations
          section.

        Uses of names and addresses

            A name allows a user, with the help of a "client" program,
          to retrieve or operate on objects via a "server" program.  A
          name may be passed for example:
                  - In communication of any form between two people, to
                    refer to a document, or part of a document;
                  - As part of the description of a link associated with
                    a hypertext document;
                  - As part of the result of searching an index.
            Some typical requirements on a name which are met to a
          varying degree by various schemes are for example that the
          name is
            Persistent     A given name will remain valid as long as it
                           is needed;
            Extensible     A given naming syntax will remain valid
                           through the introduction of new protocols and
                           directory technologies;
            Resolvable     A name will contain enough information to
                           allow the document or index to which it


Internet Draft               2              March 1993Uniform Resource Locator
s                        Berners-Lee


                           refers to be accessed, perhaps via resolution
                           into an intermediate, more physical, name.
            Unique         Each object can only have one such name.  The
                           fact that two such names are different
                           implies that the objects to which they refer
                           are different (in some way).
            Unambiguous    The fact that two names are identical implies
                           that the objects named are the same (in some
                           way).
            The syntax discussed is the syntax of one name, be it a
          lasting name or a physical address.  When a directory server
          or hypertext link contains a set of alternative names, then
          that is beyond the scope of this syntax.  Similarly, a syntax
          for describing a compound object is outside the scope of this
          syntax.  The specific locator name spaces (defined under the
          umbrella of the general syntax) each meet the requirements
          above to a greater or lesser extent.

        Current practice

            Current protocols use many different standards for names.
          For some protocols, such as ISO-10163 Search and Retrieve
          protocol[16], the names returned in a search are only valid
          during the session. For others, such as FTP[9], they are
          lasting names which may be used for object retrieval at a
          later time.  Typically, however, they are not long-lasting
          names which are independent of the location of the object.
          Such names may be provided using directory servers such as
          x.500.  They will refer to the registration, however formal or
          informal, of a object with a particular organisation or
          person.  Both hypertext and  manual references rely on long-
          lasting names.
            Current names are basically location specifiers (addresses).
          These may be known as Uniform Resource Locators (URLs). They
          give the necessary parts of an address for a reader to access
          an information provider using the given protocol, and ask for
          the object required. Examples of names used by various
          protocols include

File Transfer Protocol (Postel 1985):
                         Host name or IP-address
                              [TCP port]
                         [user name, password]
                         Filename

W.A.I.S. (Kahle 1990)         Host name or IP-address
                         [TCP port]
                         database name
                         local document id

Gopher (Alberti 1991)         Host name or IP-address
                         [TCP port]


Internet Draft               3              March 1993Uniform Resource Locator
s                        Berners-Lee


                         database name
                         selector string

HTTP (Berners-Lee 1991)       Host name or IP-address
                         [TCP port]
                         local object id

NNTP (Kantor 1986) group Group name

NNTP article             Host name
                         unique message identifier

Prospero links (Neuman 1992)  Host name or IP address
                         [UDP port]
                         Host specific object name
                         [version]
                         [identifier]*

x.500 distinguished name Country
                         Organisation
                         Organisational unit
                         Person
                         Local object identifier


            Other systems with their own naming schemes include BITNET
          "LISTSERV" application, FTAM file retrieval, SQLnetTM remote
          database search, proprietary  distributed file systems, etc.
          Conventional syntax for writing these addresses involve
          various forms of punctuation to separate these parts.  This
          sometimes,  but not always, allows the naming scheme to be
          deduced from the punctuation.  For example, a name of the form
          xxx.yyy.zz.edu:/pub.aa.bb.cc  often implies anonymous FTP
          access.  However, there is no well-defined algorithm for
          parsing an arbitrary name, as there is no common syntax.


        Expandability


            There will necessarily be a phase during which lasting names
          will become more  common, as the deployment of directory
          services increases to the point where  every user has direct
          or indirect access to one.  Even then, however, one can
          envisage more than one competing directory system, and cases
          in which physical  names are still required.  A directory
          service takes a lasting name and reduces it  to a physical
          address (or set of addresses) which, though less useful for
          lasting reference, is the only way to actually retrieve the
          object.




Internet Draft               4              March 1993Uniform Resource Locator
s                        Berners-Lee


            An addressing syntax is required which will be able to
          encompass existing  physical address spaces, and be extendible
          to any future protocols.  This  requires that it contain an
          identifier for the protocol in use. The format of the rest  of
          the address will necessarily depend to a certain extent on the
          protocol.


        Relevance

            The life of a name is limited by any information contained
          within it which may  become prematurely invalid. It is
          therefore necessary to limit the contents of a name to the
          information required for the operations above.  Other
          extraneous information about the object (its size, data
          format, authorisation details, etc.) may in general change
          with time and should not be part of the name.
            One might expect such information to be part of the "header"
          of a object, and for protocols to allow the header information
          to be retrieved independently of the objects themselves.
            Any physical address may be subject to change with time:
          hence we encourage the move to lasting names and directory
          services.

        Uniqueness

            Clearly one requires unambiguous names in the sense that one
          name should refer to only one logical object. This is the case
          with all the addressing schemes in use, whether they are
          directory systems or physical addresses. (The internet
          addresses all rely on the domain name (Mockapetris 1987) of
          the host to achieve this).
            However, given that names can be translated, many apparently
          different names  may lead to the same object. Any object may
          therefore be referred to by many  names. One needs to be able
          to know whether two objects, retrieved through  different
          paths, are in fact the same object.
            It is suggested that each object have a unique
                               unique
          unique "official"
          name. This name could be stored in the object in some
          representations, or stored in a database  accessible to the
          server, for example.  Any references within that object
          should be parsed in the context of the official name.  In the
          presence of a  directory service, the official name will
          normally be the registered name of the object. However, a name
          in any scheme will do, so long as it is completely specified.
          On systems which do not allow the name to be stored (such as
          anonymous FTP archive sites), a possible ambiguity will always
          exist as to whether two similarly named objects are in fact
          the same.
            Note that Internet newsgroup names are unique world-wide,
          and news articles carry a unique message id.



Internet Draft               5              March 1993Uniform Resource Locator
s                        Berners-Lee


            In most other cases, however, there is no guarantee that
          dereferencing a URL will work, or that if it does the object
          it refers to will in fact be the object intended.  URLs such
          as FTP addresses are transient in that files may be moved and
          even replaced by different files of the same name.  This
          disorganisation may be limited by good server management,  but
          a naming scheme which is independent also of internet host
          name is obviously preferable.

        Readability by people

             This requirement has been put forward by several people
          (Clifford Lynch, Douglas Engelbart among others), and disputed
          by others.  The author's view is that it will be a while
          before technology and standardisation have reached the point
          at which names and addresses will be hidden from human beings.
          As long as they must be written on the backs of envelopes and
          "cut and pasted" between workstation windows, there is a
          strong need for names to be
                  . Short
                  . Composed of printable (preferably non-white)
                    characters
                  . To a certain extent, understadable by a human being.


        Structure of names and addresses.


            A physical address is required in order for

                  . The user's program to contact the server
                  . The server to search and index,  retrieve a object,
                    or look up the name;
                  . The user's program to locate an individual position
                    or element within a object.

            This suggests that a name be structured, such that the parts
          necessary for these  three operations be separate and only
          used by those system elements which need  those parts. This
          corresponds to the basic principle of information hiding.  In
          fact,  four parts are necessary, including the indicator of
          the naming scheme to be used:

                  . The naming scheme: a registered identifier for the
                    protocol.
                  . The name of a suitable server. The format of this
                    part must be well defined. It will depend on the
                    lower-layer protocols in use.  Systems which use
                    widely distributed information, such as x.500 and
                    NNTP, do not need this part as each client generally
                    contacts his nearest server (or a particular
                    server).


Internet Draft               6              March 1993Uniform Resource Locator
s                        Berners-Lee


                  . Information to be passed to the server. This may be
                    private to the server, as all names may be generated
                    and used by the same server. This part of the name
                    should be opaque to the client.
                  . Information to be used by the application once the
                    object has been retrieved.  This part is private to
                    the application (or, more strictly, the data format)
                    and so cannot be defined here.

            Both lasting names and physical addresses often share a
          hierarchical structure. This follows often from the
          organisation of the system. From the naming point of view, it
          has the advantage that a reference in one object to another
          object need not include that part of the structure which is
          common to both names.

        Choices

            The requirements above leave little room for choice save for
          the order and punctuation of the elements of an address.  It
          is only reasonable for the order of writing of the parts to be
          consistently from left to right (or right to left) with
          increasing specificity.  Punctuation schemes fall into two
          categories (Huitema 1991): tagged schemes in which field are
          given names, and fields which use special characters and field
          order. The latter tend to be more compact schemes.



                    protocol: aftp host: xxx.yyy.edu path:
                    /pub/doc/README

                    PR=aftp; H=xx.yy.edu; PA=/pub/doc/README;

                    PR:aftp/xx.yy.edu/pub/doc/README

                    /aftp/xx.yy.edu/pub/doc/README



            Fig 1. Some alternative tagged and untagged representations

            The choice of special symbols for punctuation tends to be a
          matter of taste. It is easier to read  addresses whose symbols
          correspond to those of one's favourite operating system.  A
          variety of symbols is needed so that when a name is
          abbreviated it is possible to tell which parts have been
          omitted. The  recommendation below uses special characters in
          order to achieve a compact name, and uses where possible
          punctuation symbols established in the internet or unix
          community.



Internet Draft               7              March 1993Uniform Resource Locator
s                        Berners-Lee


            The choice of escape character for introducing
          representations of non-allowed characters also tends to be a
          matter of taste. An ANSI standard exists in the C language,
          using the back-slash character "\". The use of this character
          on unix command lines, however, can be a problem as it is
          interpreted by many shell programs, and would have itself to
          be escaped.
            The use of white space characters has been avoided  in URLs:
          spaces are not legal characters.   This was done because of
          the frequent introduction of extraneous white space when lines
          are wrapped by systems such as mail, or sheer necessity of
          narrow column width, and because of the  inter-conversion of
          various forms of white space which occurs during character
          code conversion and the transfer of text between applications.


     Recommendations

            This section describes the syntax for "Uniform Resource
          Locators" (URLs):  that is, basically physical addresses of
          objects which are retrievable using protocols already deployed
          on the net.  The generic syntax provides a framework for new
          schemes for names to be resolved using as yet undefined
          protocols.
            The syntax is described in two parts. Firstly, we give the
          syntax rules of a completely specified name;  secondly, we
          give the rules under which parts of the name may be omitted in
          a well-defined context.

        Full form

            A complete URL consists of a naming scheme specifier
          followed by a string whose format is a function of the naming
          scheme. For locators of information on the internet, a common
          syntax is used for the  IP address part. A BNF description of
          the URL syntax is given in an a later section.  The components
          are as follows.

        Fragment-id

            This represents a part of, fragment of, or a sub-function
          within, an object or object. Its syntax and semantics are
          defined by the application responsible for the object, or the
          specification of the content type of the object. The only
          definition here is of the allowed characters by which it may
          be represented in a URL.
            The fragment-id follows the URL of the whole object from
          which it is separated by a hash sign (#).  If the fragment-id
          is void, the hash sign may be omitted: A void fragment-id with
          or without the hash sign means that the URL refers to the
          whole object.



Internet Draft               8              March 1993Uniform Resource Locator
s                        Berners-Lee


            While this hook is allowed for identification of fragments,
          the question of addressing of parts of objects, or of the
          grouping of objects and relationship between contined and
          containing objects, is not addressed by this object.
            This object does not address the question of objects which
          are different versions of a "living" object, nor of expressing
          the relationships between different versions and the living
          object.

        Scheme


            Within the URL of a object, the first element is the name of
          the scheme, separated from the rest of the object by a colon.
          The rest of the URL follows the colon in a format depending on
          the scheme.


        Internet protocol parts


            Those schemes which refer to internet protocols have a
          common syntax for the rest of the object name. This starts
          with a double slash "//" to indicate its presence, and
          continues until the following slash "/".  Within that section
          are

                  . An optional user name, if this must be quoted to the
                    server, followed by  a commercial at sign "@".  (Use
                    of this field is discouraged. Provision  of encoding
                    a password after the user name, delimited by a
                    colon, could  be made but obviously is only useful
                    when the password is public, in  which case it
                    should not be necessary, so that is also
                    discouraged.)
                  . The internet domain name  of the host in RFC1037
                    format (or, optionally  and less advisably, the IP
                    address as a set of four decimal digits)
                  . The port number, if it is not the default number for
                    the protocol, is given in decimal notation after a
                    colon.


        Path

            The rest of the locator is known as the "path". It may
          define details of how the client should communicate with the
          server, including information to be passed transparently to
          the server without any processing by the client.
            The path is interpreted in a manner dependent on the
          protocol being used. However, when it contains slashes, these
          must imply a hierarchical structure.


Internet Draft               9              March 1993Uniform Resource Locator
s                        Berners-Lee



        Partial form

            In a certain limited set of cases, generally within a
          certain application, it may be useful to pass only a section
          of the URL. Within a object whose URL is well defined, the URL
          of another object may be given in abbreviated form, where
          parts of the two URLs are the same. This allows objects within
          a group to refer to each other without requiring the space for
          a complete reference, and it incidentally allows the group of
          objects  to be moved without changing any references.  This is
          not discussed in detail here, it is only mentioned so that the
          characters required by the technique be reserved for that
          purpose.  It must be emphasised that when a reference is
          passed in anything other than a well controlled context, the
          full form must always be used.
            The partial form relies on a property of the URL syntax that
          certain characters ("/") and certain path elements ("..", ".")
          have a significance reserved for representing a hierarchical
          space, and must be recognised as such by both clients and
          servers.
            A partial form can be distinguished from a full form in that
          a full form must have a colon and that colon must occur before
          any slash characters.
            The rules for the use of a partial name are:
                  . If the scheme parts  are different, the whole
                    absolute locator must be given. Otherwise, the
                    scheme is omitted, and:
                  . If the host and/or port parts are the different, the
                    host, port name and all the rest of the locator must
                    be given.
                  . If the access and host parts are the same, then the
                    path may be given in absolute (fully qualified) or
                    relative form. Within the path:
                  . If a leading slash is present, the path is absolute.
                    Otherwise, a relative path is interpreted as
                    follows:
                  . The last part of the path of the context locator
                    (anything following the rightmost slash) is removed,
                    and the given partial URL appended in its place.
                  . Within the result,  all occurrences of "/xxx/.."  or
                    "/." are recursively removed, where xxx, ".." and
                    "."  are complete path elements.


        Encoding prohibited characters

            When a system uses a local addressing scheme, it is useful
          to provide a mapping from local addresses into URLs so that
          references to objects within the addressing scheme may be
          referred to globally, and possibly accessed through gateway
          servers.


Internet Draft               10             March 1993Uniform Resource Locator
s                        Berners-Lee


            Any mapping scheme may be defined provided it is
          unambiguous, reversible, and provides valid URLs. It is
          recommended that where hierarchical aspects to the local
          naming scheme exist, they be mapped onto the hierarchical URL
          path syntax in order to allow the partial form to be used.
            The following encoding method is used for mapping WAIS, FTP,
          Prospero and Gopher addresses onto URLs.  Where the local
          naming scheme uses ASCII characters which are not allowed in
          the URL,  these may be represented in the URL by a percent
          sign "%" followed by two hexadecimal digits (0-9, A-F) giving
          the ISO Latin 1 code for that character.  Character codes
          other than those allowed by the syntax shall not be used in a
          URL.
            The same encoding method may be used for encoding characters
          whose use, although technically allowed in a URL, would be
          unwise due to problems of corruption by imperfect gateways or
          misrepresentation due to the use of variant character sets, or
          which would simply be awkward in a given environment.  As a %
          sign always indicates an encoded character, a URL may be made
          safer simply by encoding any characters considered unsafe,
          while leaving already encoded characters still encoded.
            (Note: If a new naming scheme is introduced which encodes
          binary data as opposed to text, then a more compact encoding
          such as pure hex or base 64 would be more appropriate.)
            The same considerations apply to mapping local fragment
          identifiers onto the fragmentid part of a URL.

     Specific Naming Schemes

            The mapping for some existing standard and experimental
          protocols is outlined in the BNF syntax definition.  Notes on
          particular protocols follow.

        HTTP

            The HTTP protocol specifies that the path is handled
          transparently by those who handle URLs, except for the servers
          which dereference them.   The path is passed by the client to
          the server with any request, but is not otherwise understood
          by the client.  The fragmentid part is not sent with the
          request.  The search part, if present, is sent.

        FTP

            The ftp: prefix indicates a file which is to be picked up
          from the file system of the given host. The FTP protocol is
          used. The port number if given gives the port of the FTP
          server if not the FTP default. (A client may in practice use
          local file access to retrieve objects which are available
          though more efficient means such as local file open or NFS
          mounting, where this is available and equivalent)



Internet Draft               11             March 1993Uniform Resource Locator
s                        Berners-Lee


            The syntax allows for the inclusion of a user name and even
          a password for those systems which do not use the anonymous
          FTP convention. The default, however, if no user or password
          is supplied, will be to use that convention, viz. that the
          user name is "anonymous" and the password the user's mail
          address.
            The adoption of a unix-style syntax involves the conversion
          into non-unix local forms by either the client or server. Some
          non-unix servers do this, but clients wishing to access sites
          which do not have unix-style naming will need certain
          algorithms to enable  other file systems to be identified and
          treated.  Client software may also have to be flexible in
          terms of the sequence of FTP commands used with different
          varieties of server.  In view of a tendency for file systems
          to look increasingly similar, it was felt that the URL
          convention should not be weighed down by extra mechanisms for
          identifying these cases.
            The data format of a file can only, in the general FTP case,
          be deduced from the name, normally the suffix of the name.
          This is not standardised. The transfer mode (binary or text)
          must in turn be deduced from the data format.  It is
          recommended that conventions for suffixes of public archives
          be established, but it outside the scope of this paper.

        News

            The news locators refer to either news group names or
          article message identifiers which must conform to the rules of
          RFC 850.  A message identifier may be distinguished from a
          news group name by the presence of the commercial at "@"
          character.   These rules imply that within an article, a
          reference to a news group or to another article will be a
          valid URL (in the partial form).
            Note: An outstanding problem is that the message identifier
          is insufficient to allow the retrieval of an expired article,
          as no algorithm exists for deriving an archive site and
          filename.  The addition of the date and news group set to the
          article's URL would allow this if a directory existed of
          archive sites by news group.  Suggested subject of study in
          conjunction with NNTP WG.  Further extension possible may be
          to allow the naming of subject threads as addressable objects.

        WAIS

            The current WAIS implementation public domain requires that
          a client know the "type" and length of a object prior to
          retrieval.  These values are returned along with the internal
          object identifier in the search response.  They have been
          encoded into the path part of the URL in order to make the URL
          sufficient for the retrieval of the object.  If  changes to
          WAIS specifications make the internal id something which is



Internet Draft               12             March 1993Uniform Resource Locator
s                        Berners-Lee


          sufficient for later retrieval then this will not be
          necessary.
            Within the WAIS world, names do not of course not need to be
          prefixed by "wais:"  (by the partial form rules).

        Prospero

            The Prospero (Neuman, 1991) directory srvice is used to
          resolve the URL yielding an access method for the object
          (which can then itself be represented as a URL if translated).
          The host part contains a host name or internet address.  The
          port part is optional.  The path part contains a host specific
          object name, an optional version number, and an optional list
          of attributes.  If these latter feilds are present thy are
          separated from the host specific object name and from each
          other by the characters "%00" (percent, zero, zero),  this
          being and escaped string terminator (null).  If the optional
          list of attributes is provided, the version number must be
          present, but may be the empty string (i.e. the first attribute
          would be seperaed from the host specific name by "%00%00").
          External Prospero links are represented directly as URLs of
          the underlying access method and are not represented as
          Prospero URLs.

        Gopher

            The first character of the URL path part (after the initial
          single slash) is a single-character "type" field which is that
          used by the Gopher protocol.  The rest of the path is the
          "selector string", with disallowed characters encoded. Note
          that some selector strings begin with a copy of the gopher
          type character, in which case that character will occur twice
          consecutively in the URL.  If the type character and selector
          are omitted, the type defaults to "1".
            Gopher links which refer to different protocols should be
          converted into URLs for those protocols.

        Telnet, rlogin, tn3270

            The use of URLs to represent interactive sessions is a
          convenient extension to their uses for objects.  This allows
          access to information systems which only provide an
          interactive service, and no information server.  As
          information within the service cannot be addressed
          individually or, in general, automatically retrieved, this is
          a less desirable, though currently common, solution.

        x500

            The mapping of x500 names onto URLs is not defined here. A
          decision is required as to whether "distinguished names" or
          "user friendly names" (ufn), or both, should be allowed. If


Internet Draft               13             March 1993Uniform Resource Locator
s                        Berners-Lee


          any punctuation conversions are needed from the adopted x500
          representation (such as the use of slashes between parts of a
          ufn) they must be defined. This is a subject for study.

        WHOIS

            This prefix describes the access using the "whois++" scheme
          in the process of definition. The hostname part is the same as
          for other IP based schemes. The path part can be either a
          whois handle for a whosi object, or it can be a valid whois
          query string. This is a subject for further study.

        Network Management Database

            This is a subject for study.


        Registration of naming schemes

            A new naming scheme may be introduced by defining a mapping
          onto a conforming URL syntax, using a new scheme identifier.
          Experimental scheme identifiers may be used by mutual
          agreement between parties, and must start with the characters
          "x-".  The scheme name "urn:" is reserved for the work in
          progress on a scheme for more persistent names.  Therefore
          URNs (Names) and URLs (Locators)  be distinguishable.  An
          object which is either a URL or a URN is known as a URI
          (Identifier).
            It is proposed that the Internet Assigned Numbers Authority
          (IANA) perform the function of registration of new schemes.
          Any submission of a new URL scheme must include a definition
          of an algorithm for the retrieval of any object within that
          scheme. The algorithm must take  the URL and produce either a
          set of URL(s) which will lead to the desired object, or the
          object itself, in a well-defined or determinable format. It is
          recommended that those proposing a new scheme demonstrate its
          utility and operability by the provision of a gateway which
          will provide images of objects in the new scheme for clients
          using an existing protocol.  If the new scheme is not a
          locator scheme, then the properties of names in the new space
          should be clearly defined.
            It is likewise recommended that, where a protocol allows for
          retrieval by URI, that the client software have provision for
          being configured to use specific gateway locators for indirect
          access through new naming schemes.

     BNF syntax

            This is a BNF-like description of the Uniform Resource
          Locator syntax. A vertical  line "|"  indicates alternatives,
          and [brackets]  indicate optional parts.  Spaces are
          representated by the word "space".   Single letters stand for


Internet Draft               14             March 1993Uniform Resource Locator
s                        Berners-Lee


          single letters. All words of more than one letter below are
          entities described somewhere in this description.

            fragmentaddress   url [ # fragmentid ]
            url            generic | httpaddress | fileaddress |
                           newsaddress | prosperoaddress | telnetaddress
                           | gopheraddress | waisaddress | afsaddress
            generic        scheme :  path [ ? search ]
            scheme         ialpha
            httpaddress    h t t p :   / / hostport  [  / path ] [ ?
                           search ]
            fileaddress    f t p : / / host / path
            afsaddress     a f s : / / cellname / path
            newsaddress    n e w s : groupart
            waisaddress    waisindex | waisdoc
            waisindex      w a i s : / / hostport / database [ ? search
                           ]
            waisdoc        w a i s : / / hostport / database / wtype /
                           digits / path
            groupart       * | group | article
            group          ialpha [ . group ]
            article        xalphas @ host
            database       xalphas
            wtype          xalphas
            prosperoaddress   prosperolink
            prosperolink   p r o s p e r o : / / hostport / hsoname [ %
                           0 0 version [ attributes ] ]
            hsoname        path
            version        digits
            attributes     attribute [ attributes ]
            attribute      alphanums
            telnetaddress  t e l n e t : / / [ user @ ] hostport
            gopheraddress  g o p h e r : / / hostport  [/ gtype  [
                           selector ] ] [ ? search ]
            hostport        host [ : port ]
            host           hostname | hostnumber
            cellname       hostname
            hostname       ialpha [  .  hostname ]
            hostnumber     digits . digits . digits . digits
            port           digits
            selector       path
            path           void |  xpalphas  [  / path ]
            search         xalphas [ + search ]
            user           xalphas
            fragmentid     xalphas
            gtype          xalpha
            xalpha         alpha | digit | safe | extra | escape
            xalphas        xalpha [ xalphas ]
            xpalpha        xalpha | +
            xpalphas       xpalpha [ xpalpha ]
            ialpha         alpha [ xalphas ]



Internet Draft               15             March 1993Uniform Resource Locator
s                        Berners-Lee


            alpha          a | b | c | d | e | f | g | h | i | j | k | l
                           | m | n | o  | p | q | r | s | t | u | v | w
                           | x | y | z | A | B | C  | D | E | F | G | H
                           | I | J | K | L | M | N  | O | P |  Q | R | S
                           | T | U | V | W | X | Y | Z
            digit          0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
            safe           $ | - | _ | @ | . | &
            extra          ! | * | " |  ' | ( | ) | : | ; | , | space
            escape         % hex hex
            hex            digit | a | b | c | d | e | f | A | B | C | D
                           | E | F
            variant        { | } | |                                   |
                            | | [ | ] | \ | ^ | ~
            punctuation    < | >
            digits         digit [ digits ]
            alphanum       alpha | digit
            alphanums      alphanum [ alphanums ]
            void


     Security considerations

            The URL scheme does not in itself pose a security threat.
          Users should beware that there is no general guarantee that a
          URL which at one time points to a given object continues to do
          so, and does not even at some later time point to a different
          object due to the movement of objects on servers.

     Conclusion

            A need has been demonstrated, and a number of requirements
          have been stated for uniform resource locators (URLs). A
          scheme has been proposed which builds on existing conventions
          to define a syntax for URLs.  This scheme has been in serious
          use by World-Wide Web (W3) initiative since 1991.  Adoption of
          the scheme in correspondence, standards and software will ease
          the use of references to on-line information in a flexible way
          as the coming information age arrives.

     Acknowledgements

            This paper builds on the basic W3 design and much discussion
          of these issues by many people on the network.  The discussion
          was particularly stimulated by articles by Clifford Lynch
          (1991), Brewster Kahle (1991) and Wengyik Yeong (1991b).
          Contributions from John Curran (NEARnet), Clifford Neuman
          (ISI) Ed Vielmetti (MSEN) and later the IETF URL BOF and URI
          working group have been incorporated into this issue of this
          paper.
            The draft url4  (Innternet draft 00) was generated from url3
          following discussion and overall approval of the URL working
          group on 29 March 1993. The paper url3 had been generated from
          udi2 in the light of discussion at the UDI BOF meeting at the


Internet Draft               16             March 1993Uniform Resource Locator
s                        Berners-Lee


          Boston IETF in July 1992. Draft url4 was Internet Draft 00.
          Draft url5 incorporated changes suggested by Clifford Neuman,
          and draft url6 (ID 01) incorporated character group changes
          and a few other fixes defined by the IETF URI WG in submitting
          it as a proposed standard.

     References

        Alberti, R., et.al.  (1991) "Notes on the Internet Gopher
            Protocol" University of Minnesota, December 1991,
            URL=<ftp://boombox.micro.umn.edu/pub/gopher/gopher_protocol
            >. See also
            URL=<gopher://gopher.micro.umn.edu/00/Information%20About%2
            0Gopher/About%20Gopher>
        Berners-Lee, T., (1991) "Hypertext Transfer Protocol (HTTP)",
            CERN, December 1991,
            URL=<ftp://info.cern.ch./pub/www/doc/http-spec.txt>
        Davis, F, et  al., (1990) "WAIS Interface Protocol: Prototype
            Functional Specification", Thinking Machines Corporation,
            April 23, 1990
            URL=<ftp://quake.think.com/pub/wais/doc/protspec.txt>
        International Standards Organization, (1991) Information and
            Documentation - Search and Retrieve Application Protocol
            Specification for open Systems Interconnection, ISO-10163
        Huitema, C., (1991) "Naming: strategies and techniques",
            Computer Networks and ISDN Systems 23 (1991) 107-110.
        Kahle, Brewster, (1991) "Document Identifiers,  or
            International Standard Book Numbers for the Electronic
            Age", URL=<ftp://quake.think.com/pub/wais/doc/doc-ids.txt>
        Kantor, B., and Lapsley, P., (1986) "A proposed standard for
            the stream-based transmission of news", Internet RFC-977,
            February 1986. URL=<ftp://nnsc.nsf.net/rfc/rfc977.txt>
        Lynch, C., Coallition for Networked Information: (1991)
            "Workshop on ID and Reference Structures for Networked
            Information", November 1991. See
            URL=<wais://quake.think.com/wais-discussion-archives?lynch>
        Mockapetris, P., (1987) "Domain names + concepts and
            facilities", RFC-1034, USC-ISI, November 1987,
            URL=<ftp://nnsc.nsf.net/rfc/rfc1034.txt>
        Neuman, B. Clifford, (1992) "Prospero: A Tool for Organizing
            Internet Resources", Electronic Networking: Research,
            Applications and Policy, Vol 1 No 2, Meckler Westport CT
            USA.  See also
            URL=<ftp://prospero.isi.edu/pub/prospero/oir.ps>
        Postel, J. and Reynolds, J. (1985)  "File Transfer Protocol
            (FTP)", Internet RFC-959, October 1985.
            URL=<ftp://nnsc.nsf.net/rfc/rfc959.txt>
        Yeong, W., (1991a) "Towards Networked Information Retrieval",
            Technical report 91-06-25-01, June 1991, Performance
            Systems International, Inc.
            URL=<ftp://uu.psi.com/wp/nir.txt>



Internet Draft               17             March 1993Uniform Resource Locator
s                        Berners-Lee


        Yeong, W., (1991b), "Representing Public Archives in the
            Directory", Internet Draft, November 1991.  In
            <wais://nnsc.nsf.net/internet-drafts?yeong>. Work in
            progress.

        Author's address


               Tim Berners-Lee
               World-Wide Web project
               CERN, 1211 Geneva 23, Switzerland
               +41 (22)767 3755
               timbl@info.cern.ch









































Internet Draft               18             March 1993
.

