TITLE: Contribution on Z39.50 PICS Proforma SUBMITTED TO: Z39.50 Implementors Workshop, St. Louis, June 4-5, 1990 SOURCE: D. MacKinnon and J. Zeeman, Software Kinetics Ltd. 1. COMMENTS ON Z39.50 PROFILE PICS PROFORMA 6.1 Version 2? Is it correct to presume that version 2 is meant to be the ASN.1 encoded version? 6.6 The profile identifies only the USMARC database record syntax. It is not clear whether this syntax is appropriate for both full and brief records. Also, one of the first uses of SR, in Canada at least, will be for verifying bibliographic data and determining locations for inter-library loan requests. Since the ILL protocol does not use MARC in its ItemId, extensive reformatting of records received via SR may be necessary. It would be useful if Z.I.G. would agree to specify a standard ILL record transfer syntax, which could become an optional part of the SR profile. 7.3.3.2 It seems unnecessary to disallow all use of result sets in queries except as the first operand, since some IR systems may support these capabilities. Since the diagnostic record would seem to have sufficient messages available to signal the case where a target did not support result sets as operands, there appears to be no need to explicitly disallow the possibility of its use in the profile. 8.1.1 There appears to be a logical problem with these maxima. The origin has to propose a maximum message size of at least 10,000, but here it says "at most 10,000". It seems sufficient to state that the minimum maximum-message size should simply be 10000 and not "at most 10000". 8.3.3.2 Where does the standard specify the maximum no. of concurrent results sets of 1? 8.3.4 Does the profile need to specify what happens when a valid search retrieves 0 records?--i.e. clarify in what circumstances the codes in searchStatus are used? For instance, it could be that searchStatus FALSE is allowed only when a diagnostic record of type 1 or type 2 is returned. TITLE: Contribution on Agenda Items SUBMITTED TO: Z39.50 Implementors Workshop, St. Louis, June 4-5, 1990 SOURCE: D. MacKinnon and J. Zeeman, Software Kinetics Ltd. 1. COMMENTS ON Z39.50 IMPLEMENTOR'S WORKSHOP AGENDA ITEMS 1.1 Diacritics and Character Sets It seems that this can be handled quite readily by existing encoding schemes, if the octet-string of the database record is taken by default as a general string. Of course, some of the implementors will have systems that can't handle that kind of encoding, but that does not appear to be a standards problem. Thee will need to be some discussion as to whether ISO registered bibliographic sets are to be used or more general commercial sets. Most of the bibliographic databases will support the bibliographic sets, but the full text databases probably won't. 1.2 Images, Graphics and Fonts There seems to be a continuing tension between those who want to use Z39.50 for retrieval of bibliographic records and those who want to broaden it to support the retrieval of complete documents. It is not clear yet whether this tension can be adequately resolved. There are two ways of retrieving full documents: the first by first of all retrieving a description of it (bibliographic record in other words) and then explicitly asking for the full document; and the second by searching the full documents themselves, which will retrieve the full document. The first type is fairly easily dealt with by using either another standard (likely to involve something like ODA) or adding some additional APDU's to Z39.50. The second, however, is more difficult, because the presentation of documents is not really included in the service model of the protocol. All the assumptions are that searching and presentation involves "records", i.e. relatively short objects with some degree of structure. The possibility that a record might be kilobytes or even megabytes long is simply not allowed for in the service definition. The kind of data transfer involved her seems much more appropriate for something like FTAM. So maybe the solution is to investigate the possibility of combining two protocols, SR for searching, which will retrieve short segments of a full text (e.g. a paragraph) to allow a relevance decision to be made, and FTAM to handle the transfer of full documents, fonts, images, graphics, sound and all. This concept of combining multiple protocols within a single session is well accepted in the standards community and should be investigated further. TITLE: Contribution on Stop Words SUBMITTED TO: Z39.50 Implementors Workshop, St. Louis, June 4-5, 1990 SOURCE: D. MacKinnon and J. Zeeman, Software Kinetics Ltd. 1. CONTRIBUTION ON STOP WORDS There is a diagnostic message "Terms only exclusion (stop) words". However, there is no reference in the protocol as to what is to be done about stop words. The treatment of stop words used as search terms varies from system to system; the British Library's system, for instance, has no way of filtering them out, so a stop word used as a search term will retrieve 0 records and if ANDed with other terms will mean that a record which may well be present will not be found (AND, OR and NOT, which are also commonly stop words, present separate problems--the RPN query can support them as search terms, but the vast majority of systems cannot). Other systems (but not apparently National Library of Canada's DOBIS) can filter stop words out of the query. The diagnostic message clearly refers to the latter situation. Is there a need to make any assumptions about stop words explicit in the profile? The whole thing is complicated by the fact that the set of stop words used varies from databases to database, from field to field within the record, and even, in the case of DOBIS, according to the language code of the record. Perhaps there needs to be an additional diagnostic code: "One or more of the terms is a stop word", although this may not be of any assistance to the origin system and could mean a lot a work in implementing a target system. But it seems a shame to let records be missed because of stop word problems. TITLE: Contribution on Z39.50 Profile SUBMITTED TO: Z39.50 Implementors Workshop, St. Louis, June 4-5, 1990 SOURCE: D. MacKinnon and J. Zeeman, Software Kinetics Ltd. 1. CONTRIBUTION ON Z39.50 PROFILE 1.1 End of session information The Z39.50 protocol provides no mechanism for transmitting end of session information, such as cost or statistics on which billing will be based. It may be appropriate to include as part of the Z39.50 profile, a statement to the effect that end of session information, such as billing information, may be conveyed as user information of the Association Control A- RELEASE service. The registration of the EXTERNAL type definitions for such information could be handled by the Z39.50 maintenance agency. TITLE: Contribution on Attributes SUBMITTED TO: Z39.50 Implementors Workshop, St. Louis, June 4-5, 1990 SOURCE: D. MacKinnon and J. Zeeman, Software Kinetics Ltd. 1. CONTRIBUTION ON ATTRIBUTES 1.1 Attribute Set There seems to me to be a problem with the term types in general--they are too closely modelled on MARC tagging, and not on how the individual fields tend to be conflated for indexing purposes. For instance, MARC fields 100,110,111,700,710,711 (and sometimes 6XX) are by and large indexed together as "Name". Slightly different indexing rules will be applied to the different name types, but most online databases (National Library of Canada's DOBIS is a good example) have a single index term for name. Similarly, they tend to have a single index term "Title", which indexes the title proper, uniform title and series title fields. The attribute set, however, has separate types for personal, corporate and conference names, but not a type for Name in general. This means that the attribute set enforces a distinction not generally present in the databases to which the attribute set must map. An origin looking for, say, a personal name in target which only supported Names could not be sure that what was returned was what was asked for (e.g. an origin looking for the person Albert Einstein might well retrieve records for the Albert Einstein Institute of Advanced Physics). Conversely, an origin looking for a general name would need to OR together three searches for the same search term with different types (Albert Einstein-personal name OR Albert Einstein-corporate name OR Albert Einstein-conference name), no matter what the target supported. Neither is there any means of searching Name-as-subject separately from Names-as-author. A search for Shakespeare is disheartening enough when it returns 25000 records for works by Shakespeare. If it also returns the 100,000-odd records for work about Shakespeare, the whole thing becomes impossible. This distinction, however, may not be supported by many IR systems (it is British Library's BLAISE). It is proposed that EITHER: 1. The following attributes be added to the Bib-1 set: - 41 Name or GenericName - 42 UniqueNumber or GenericUniqueNumber - 43 Classification or GenericClassification - 44 SubjectHeading or GenericSubjectHeading OR 2. A separate attribute set be registered as Bib-2- Generic including the above attributes together with appropriate others from the Bib-1 set. 1.2 More additions to the attribute set Analysis of some Canadian requirements has identified a few deficiencies in the attribute set, some of which are specific to Canadian use, while others seem to be generally, useful, particularly when qualifying an existing results set. Among the latter are: - place of publication - coded information--language - country of publication - contents indicator - government publication indicator These would all be type 1 attributes. There is also a missing type 5 attribute: embedded or internal truncation. 1.3 Passing Information on Attributes Supported The question of how the origin system knows what attributes are supported by the target is not currently addressed by the standard. It is therefore left to implementors to negotiate such matters externally, probably by bilateral agreement. This would seem contrary to the whole purpose of OSI. While accounting and billing requirements mean that for now any use of SR will require prior bilateral agreement, the whole tenor of applications like the Directory is away from bilateral arrangements towards negotiation at connect time, which could surely be used to handle financial arrangements as well. It is therefore important that SR should at least be capable of allowing an origin to meaningfully address queries to a target that it knows nothing about beforehand. This requires that there should be some way for the origin to find out what search attributes the target supports, and in which combination. The draft application-context definition gives a minimum subset, but restricting all remote queries to this bare minimum of functionality would defeat much of the point of remote access (dial up with a manual in hand would accomplish more). There would thus seem to be a requirement to be able to indicate support for more than the minimum subset. It is, however, certain that very few, if any, IR systems will support all the attribute types in all the possible combinations. There is therefore a requirement for the target system to let the origin system know that it supports more than the minimum but less than the full attribute set. How can this be achieved? One possible way might be for the target system to let the origin system know as part of the InitResponse APDU which combinations of attributes are acceptable. This need not be a substantial amount of data; the two systems will have already agreed on an attribute set (either by means of the default in the application context or by negotiation), so all the target system has to do is send a structure set of integers referring to the various combinations of attribute codes in the agree attribute set which it can support. For instance, the target system might indicate that (using the ISO 10163 Bib-1 set) it supports (among others) the combination (1,1)+(2,3)+(4,2)+(5,1)+(6,3) meaning: terms from a title (1,1) which equate (2,3) to words (4,2) taken from any position (3,3) in the complete field(s) (6,3) with right truncation (5,1). The target system could easily keep such a constant, and the origin could check each query it formulates against the set of legal attributes and take the appropriate action, such as asking the operator to reformulate the query or reformulating automatically on the basis of some kind of table (or the origin could simply ignore the information and risk getting lots of "Unsupported search" diagnostic records). The formal definition of this data could be implemented in ASN.1 by the following productions: SupportedSearches ::=SET OF LegalAttributeCombinations LegalAttributeCombinations ::=SET OF AttributeElement --each element in the set must be of a different type --AttributeElement is defined as part of Query in ISO 10163 There is, however, an additional complication: the protocol supports multiple database searching, and the set of LegalAttributeCombinations may well vary from database to database. Therefore a separate set will need to be supplied for different groups of databases available for searching. This could be a substantial amount of data, which it would be wasteful to send each time an origin logged on to a heavily used target. It therefore seems sensible to require this information to be sent only in response to a request for it. The INIT service is the appropriate place to both make the request and receive the information. If it is desirable to avoid making any changes to the standard, the UserInformation field, defined as an EXTERNAL type, is available and can be used to transfer both the request and the response: UserInformationField ::=CHOICE{ supplySearchInfo BOOLEAN DEFAULT FALSE, --this choice only to occur in InitializeRequest APDU SearchInfo --this choice only to occur in InitializeResponse APDU } SearchInfo ::=SET OF DatabaseInfo DatabaseInfo ::SEQUENCE { SET OF DatabaseName, --define in ISO 10163 SupportedSearches } All that would be required to implement this in the existing standard is to register an ASN.1 module containing these productions with the appropriate body and include the module as a legitimate external object identifier for the User Information field in the Application-Context definition. 1.4 End of session information The protocol provides no mechanism for transmitting end of session information, such as cost or statistics on which billing will be based. There is therefore no way for the origin system to check its bills, or, if individual sessions are billed to different users, to bill them. Is this important? Can anything be done about it? 1.5 Use with the ILL protocol One of the first uses of SR, in Canada at least, will be for verifying bibliographic data and determining locations for routing inter-library load requests. Since the ILL protocol does not use MARC in its ItemId, extensive reformatting of records received via SR may be necessary. It would be useful if Z.I.G. would agree to specify a standard ILL record transfer syntax, which could become an optional part of the SR profile.