N-1-2-040.31.2 The Resource Discovery Problem by Peter Deutsch*, (peterd@cc.mcgill.ca) It may be argued that as the Internet has grown from a collection of hundreds of machines to one of hundreds of thousands of machines a fundamental shift in focus is occurring among its users. Rather than seeing themselves as primarily interacting with other individuals on the net, users more and more have come to see themselves as interacting with "the net" itself, with a vast pool of machines and their associated resources that function as a virtual provider of electronic goods and services. As this perception of the Internet as a collection of services has grown, a new problem has presented itself to would-be service providers. This problem, the so-called "Resource Discovery Problem", must be adequately addressed if we are to move towards a true Internet-wide model of resource access. The basic problem can be broken down into four sub-problems - Class Discovery, Instance Location, Instance Access and information Management. Let us examine each of these in turn: Class Discovery refers to the seeking out a specific type of service in a larger community of service providers, presumably without any a priori knowledge of the existence or relevance of specific service providers. Thus, a user might seek to locate "collections of information dealing with the genome project", or "collections of freely available software", without knowing what specific services exist to satisfy such queries. In an idealized system, a "Class Discovery Service" (perhaps a better term would be "Resource Information Service") could be asked such general questions, and would reply with a set of appropriate service providers. A representative reply to such a query might be a list of archive indexers, a set of relevant anonymous FTP archives and perhaps the names and locations of relevant WAIS and Gopher servers. Once the existence of a specific collection of service providers appropriate to the problem has been established a user can proceed to Instance Location. For example, once the location of various anonymous FTP archive sites and archive indexers has been established, a user might then send queries to the relevant collection of archive indexers, limiting the search to the specified sets of archives to speed the search. The replies in this case would be references to specific files on the network that respond to the user's search criteria. The final step in information discovery is Instance Access. Carrying out this step would involve using the appropriate access method (using such protocols as FTP, WAIS/Z39.50 or Gopher, as appropriate). A final problem remains - Information Management. Once a user has discovered relevant references, he or she may not wish to store specific copies of information, but rather may instead elect to build up libraries of references, provided the underlying tools are capable of resolving such references as needed. Work is underway to standardize these references to allow the use of "Universal Document Identifiers" (or UDIs) across multiple systems. Such UDIs would allow the sharing of references to information across multiple information systems across the Internet, so that queries to archie could give back pointers to Gopher or WWW documents as well as files available through Anonymous FTP. Practical, useful services have been developed and deployed by a number of researchers (currently using entirely volunteer resources) that address most parts of the Resource Discovery problem. The past 18 months has seen the creation of such services as archie, Prospero, WAIS, WWW and Gopher, among others. Each successfully attacks some portion of the problem. Thus, the user menus in the Gopher system may be seen as a library of references to specific information and service providers on the Internet and the Gopher service itself (which allows the user to resolve these references in a transparent manner) in one example of the power of Instance Access tools. Similarly, the archie system nay been seen as a simple and fast indexing service to perform Instance Location, returning references that can then be resolved using appropriate Instance Access software (be it the venerable "ftp" command, the Gopher-archie gateway or the appropriate portion of one of the newer GUI-based archie access tools). WAIS provides Instance Location and Access and Prospero and WWW both provide means for organizing and accessing information, Prospero at granularity of individual files and WWW at the level of hypertext documents. Despite the success of these early experimental systems, much work remains to be done. The existing services need to be further refined and expanded, then deployed as real, funded and supported services. In addition, one major portion of the problem still remains to be addressed. There still exists a need for tools to aid in the Class Discovery step. This is becoming more pressing with the success of the systems listed above as their are now in fact a significant number of service providers to be accessed. For example, over 200 locations provide Gopher servers and a similar number provide WAIS sources, while there are close to 15 archie servers now available on the Internet. New services continue to be deployed each week. There are now plans to deploy an experimental Resource Information Service to address this problem, using a modified version of the current archie service. The plan is to automatically gather and store the descriptions and locations of such services in a new archie database, then use archie's proactive data-gathering model to keep the database up to date, periodically verifying that specific services are still up and available. Users will be able to query this new database, seeking out services by their type or descriptions. Current plans call for the deployment of this experimental RIS server by July, 1992. Watch the net for more details in the coming months. *President, Bunyip Information Systems