N-1-2-040.31.2  The Resource Discovery Problem by 
Peter Deutsch*, (peterd@cc.mcgill.ca)


It may be argued that as the Internet has grown from a collection of
hundreds of machines to one of hundreds of thousands of machines a
fundamental shift in focus is occurring among its users.  Rather than
seeing themselves as primarily interacting with other individuals on
the net, users more and more have come to see themselves as
interacting with "the net" itself, with a vast pool of machines and
their associated resources that function as a virtual provider of
electronic goods and services.

As this perception of the Internet as a collection of services has
grown, a new problem has presented itself to would-be service
providers.  This problem, the so-called "Resource Discovery Problem",
must be adequately addressed if we are to move towards a true
Internet-wide model of resource access.

The basic problem can be broken down into four sub-problems - Class
Discovery, Instance Location, Instance Access and information
Management.  Let us examine each of these in turn:

Class Discovery refers to the seeking out a specific type of service
in a larger community of service providers, presumably without any a
priori knowledge of the existence or relevance of specific service
providers.  Thus, a user might seek to locate "collections of
information dealing with the genome project", or "collections of
freely available software", without knowing what specific services
exist to satisfy such queries.

In an idealized system, a "Class Discovery Service" (perhaps a better
term would be "Resource Information Service") could be asked such
general questions, and would reply with a set of appropriate service
providers.  A representative reply to such a query might be a list of
archive indexers, a set of relevant anonymous FTP archives and perhaps
the names and locations of relevant WAIS and Gopher servers.

Once the existence of a specific collection of service providers
appropriate to the problem has been established a user can proceed to
Instance Location.  For example, once the location of various anonymous
FTP archive sites and archive indexers has been established, a user
might then send queries to the relevant collection of archive
indexers, limiting the search to the specified sets of archives to
speed the search.  The replies in this case would be references to
specific files on the network that respond to the user's search
criteria.

The final step in information discovery is Instance Access.  Carrying
out this step would involve using the appropriate access method (using
such protocols as FTP, WAIS/Z39.50 or Gopher, as appropriate).

A final problem remains - Information Management. Once a user has
discovered relevant references, he or she may not wish to store
specific copies of information, but rather may instead elect to build
up libraries of references, provided the underlying tools are capable
of resolving such references as needed.

Work is underway to standardize these references to allow the use of
"Universal Document Identifiers" (or UDIs) across multiple systems.
Such UDIs would allow the sharing of references to information across
multiple information systems across the Internet, so that queries to
archie could give back pointers to Gopher or WWW documents as well as
files available through Anonymous FTP.

Practical, useful services have been developed and deployed by a
number of researchers (currently using entirely volunteer resources)
that address most parts of the Resource Discovery problem.  The past 18
months has seen the creation of such services as archie, Prospero,
WAIS, WWW and Gopher, among others.  Each successfully attacks some
portion of the problem.

Thus, the user menus in the Gopher system may be seen as a library of
references to specific information and service providers on the
Internet and the Gopher service itself (which allows the user to
resolve these references in a transparent manner) in one example of
the power of Instance Access tools.

Similarly, the archie system nay been seen as a simple and fast
indexing service to perform Instance Location, returning references
that can then be resolved using appropriate Instance Access software
(be it the venerable "ftp" command, the Gopher-archie gateway or the
appropriate portion of one of the newer GUI-based archie access
tools).

WAIS provides Instance Location and Access and Prospero and WWW both
provide means for organizing and accessing information, Prospero at
granularity of individual files and WWW at the level of hypertext
documents.

Despite the success of these early experimental systems, much work
remains to be done.  The existing services need to be further refined
and expanded, then deployed as real, funded and supported services. In
addition, one major portion of the problem still remains to be
addressed.

There still exists a need for tools to aid in the Class Discovery
step.  This is becoming more pressing with the success of the systems
listed above as their are now in fact a significant number of service
providers to be accessed. For example, over 200 locations provide
Gopher servers and a similar number provide WAIS sources, while there
are close to 15 archie servers now available on the Internet.  New
services continue to be deployed each week.

There are now plans to deploy an experimental Resource Information
Service to address this problem, using a modified version of the
current archie service.  The plan is to automatically gather and store
the descriptions and locations of such services in a new archie
database, then use archie's proactive data-gathering model to keep the
database up to date, periodically verifying that specific services are
still up and available.  Users will be able to query this new database,
seeking out services by their type or descriptions.

Current plans call for the deployment of this experimental RIS server
by July, 1992.  Watch the net for more details in the coming months.


*President, Bunyip Information Systems