TITLE:        Contribution on Z39.50 PICS Proforma
 
 SUBMITTED TO: Z39.50 Implementors Workshop, St. Louis,
               June 4-5, 1990
 
 SOURCE:       D. MacKinnon and J. Zeeman,
               Software Kinetics Ltd.
 
 
 
 1.  COMMENTS ON Z39.50 PROFILE PICS PROFORMA
 
 6.1 Version 2?  Is it correct to presume that version 2 is
     meant to be the ASN.1 encoded version?
 
 6.6 The profile identifies only the USMARC database record
     syntax.  It is not clear whether this syntax is
     appropriate for both full and brief records.
 
     Also, one of the first uses of SR, in Canada at least,
     will be for verifying bibliographic data and determining
     locations for inter-library loan requests.  Since the ILL
     protocol does not use MARC in its ItemId, extensive
     reformatting of records received via SR may be necessary. 
     It would be useful if Z.I.G. would agree to specify a
     standard ILL record transfer syntax, which could become an
     optional part of the SR profile.
 
 7.3.3.2 It seems unnecessary to disallow all use of result
     sets in queries except as the first operand, since some IR
     systems may support these capabilities.  Since the
     diagnostic record would seem to have sufficient messages
     available to signal the case where a target did not
     support result sets as operands, there appears to be no
     need to explicitly disallow the possibility of its use in
     the profile.
 
 8.1.1 There appears to be a logical problem with these maxima. 
     The origin has to propose a maximum message size of at
     least 10,000, but here it says "at most 10,000".  It seems
     sufficient to state that the minimum maximum-message size
     should simply be 10000 and not "at most 10000".
 
 8.3.3.2 Where does the standard specify the maximum no. of
     concurrent results sets of 1?
 
 8.3.4 Does the profile need to specify what happens when a
     valid search retrieves 0 records?--i.e. clarify in what
     circumstances the codes in searchStatus are used?  For
     instance, it could be that searchStatus FALSE is allowed
     only when a diagnostic record of type 1 or type 2 is
     returned.
 
 
 
 
 TITLE:        Contribution on Agenda Items
 
 SUBMITTED TO: Z39.50 Implementors Workshop, St. Louis,
               June 4-5, 1990
 
 SOURCE:       D. MacKinnon and J. Zeeman,
               Software Kinetics Ltd.
 
 
 1.  COMMENTS ON Z39.50 IMPLEMENTOR'S WORKSHOP AGENDA ITEMS
 
     1.1 Diacritics and Character Sets
 
     It seems that this can be handled quite readily by
     existing encoding schemes, if the octet-string of the
     database record is taken by default as a general string. 
     Of course, some of the implementors will have systems that
     can't handle that kind of encoding, but that does not
     appear to be a standards problem.  Thee will need to be
     some discussion as to whether ISO registered bibliographic
     sets are to be used or more general commercial sets.  Most
     of the bibliographic databases will support the
     bibliographic sets, but the full text databases probably
     won't.
 
     1.2 Images, Graphics and Fonts
 
     There seems to be a continuing tension between those who
     want to use Z39.50 for retrieval of bibliographic records
     and those who want to broaden it to support the retrieval
     of complete documents.  It is not clear yet whether this
     tension can be adequately resolved.  There are two ways of
     retrieving full documents:  the first by first of all
     retrieving a description of it (bibliographic record in
     other words) and then explicitly asking for the full
     document; and the second by searching the full documents
     themselves, which will retrieve the full document.  The
     first type is fairly easily dealt with by using either
     another standard (likely to involve something like ODA) or
     adding some additional APDU's to Z39.50.  The second,
     however, is more difficult, because the presentation of
     documents is not really included in the service model of
     the protocol.  All the assumptions are that searching and
     presentation involves "records", i.e. relatively short
     objects with some degree of structure.  The possibility
     that a record might be kilobytes or even megabytes long is
     simply not allowed for in the service definition.  The
     kind of data transfer involved her seems much more
     appropriate for something like FTAM.
 
     So maybe the solution is to investigate the possibility of
     combining two protocols, SR for searching, which will
     retrieve short segments of a full text (e.g. a paragraph)
     to allow a relevance decision to be made, and FTAM to
     handle the transfer of full documents, fonts, images,
     graphics, sound and all.  This concept of combining
     multiple protocols within a single session is well
     accepted in the standards community and should be
     investigated further.
 
 
 
 TITLE:        Contribution on Stop Words
 
 SUBMITTED TO: Z39.50 Implementors Workshop, St. Louis,
               June 4-5, 1990
 
 SOURCE:       D. MacKinnon and J. Zeeman,
               Software Kinetics Ltd.
 
 
 1.  CONTRIBUTION ON STOP WORDS
 
     There is a diagnostic message "Terms only exclusion (stop)
     words".  However, there is no reference in the protocol as
     to what is to be done about stop words.  The treatment of
     stop words used as search terms varies from system to
     system; the British Library's system, for instance, has no
     way of filtering them out, so a stop word used as a search
     term will retrieve 0 records and if ANDed with other terms
     will mean that a record which may well be present will not
     be found (AND, OR and NOT, which are also commonly stop
     words, present separate problems--the RPN query can
     support them as search terms, but the vast majority of
     systems cannot).  Other systems (but not apparently
     National Library of Canada's DOBIS) can filter stop words
     out of the query.  The diagnostic message clearly refers
     to the latter situation.  Is there a need to make any
     assumptions about stop words explicit in the profile?
 
     The whole thing is complicated by the fact that the set of
     stop words used varies from databases to database, from
     field to field within the record, and even, in the case of
     DOBIS, according to the language code of the record.
 
     Perhaps there needs to be an additional diagnostic code: 
     "One or more of the terms is a stop word", although this
     may not be of any assistance to the origin system and
     could mean a lot a work in implementing a target system.
 
     But it seems a shame to let records be missed because of
     stop word problems.
 
 
 TITLE:        Contribution on Z39.50 Profile
 
 SUBMITTED TO: Z39.50 Implementors Workshop, St. Louis,
               June 4-5, 1990
 
 SOURCE:       D. MacKinnon and J. Zeeman,
               Software Kinetics Ltd.
 
 
 1.  CONTRIBUTION ON Z39.50 PROFILE
 
 1.1 End of session information
 
 The Z39.50 protocol provides no mechanism for transmitting end
 of session information, such as cost or statistics on which
 billing will be based.  It may be appropriate to include as
 part of the Z39.50 profile, a statement to the effect that end
 of session information, such as billing information, may be
 conveyed as user information of the Association Control A-
 RELEASE service.  The registration of the EXTERNAL type
 definitions for such information could be handled by the
 Z39.50 maintenance agency.
 
 
 TITLE:        Contribution on Attributes
 
 SUBMITTED TO: Z39.50 Implementors Workshop, St. Louis, 
               June 4-5, 1990
 
 SOURCE:       D. MacKinnon and J. Zeeman, 
               Software Kinetics Ltd.
 
 
 1.  CONTRIBUTION ON ATTRIBUTES
 
 1.1 Attribute Set
 
 There seems to me to be a problem with the term types in
 general--they are too closely modelled on MARC tagging, and
 not on how the individual fields tend to be conflated for
 indexing purposes.  For instance, MARC fields
 100,110,111,700,710,711 (and sometimes 6XX) are by and large
 indexed together as "Name".  Slightly different indexing rules
 will be applied to the different name types, but most online
 databases (National Library of Canada's DOBIS is a good
 example) have a single index term for name.  Similarly, they
 tend to have a single index term "Title", which indexes the
 title proper, uniform title and series title fields.
 
 The attribute set, however, has separate types for personal,
 corporate and conference names, but not a type for Name in
 general.  This means that the attribute set enforces a
 distinction not generally present in the databases to which
 the attribute set must map.
 
 An origin looking for, say, a personal name in target which
 only supported Names could not be sure that what was returned
 was what was asked for (e.g. an origin looking for the person
 Albert Einstein might well retrieve records for the Albert
 Einstein Institute of Advanced Physics).  Conversely, an
 origin looking for a general name would need to OR together
 three searches for the same search term with different types
 (Albert Einstein-personal name OR Albert Einstein-corporate
 name OR Albert Einstein-conference name), no matter what the
 target supported.
 Neither is there any means of searching Name-as-subject
 separately from Names-as-author.  A search for Shakespeare is
 disheartening enough when it returns 25000 records for works
 by Shakespeare.  If it also returns the 100,000-odd records
 for work about Shakespeare, the whole thing becomes
 impossible.  This distinction, however, may not be supported
 by many IR systems (it is British Library's BLAISE).
 
 It is proposed that EITHER:
     1.   The following attributes be added to the Bib-1 set:
          - 41 Name or GenericName
 
          - 42 UniqueNumber or GenericUniqueNumber
 
          - 43 Classification or GenericClassification
 
          - 44 SubjectHeading or GenericSubjectHeading
 
     OR
 
     2.   A separate attribute set be registered as Bib-2-
          Generic including the above attributes together with
          appropriate others from the Bib-1 set.
 
 
 1.2 More additions to the attribute set
 
 Analysis of some Canadian requirements has identified a few
 deficiencies in the attribute set, some of which are specific
 to Canadian use, while others seem to be generally, useful,
 particularly when qualifying an existing results set.  Among
 the latter are:
 
     - place of publication
 
     - coded information--language
 
     - country of publication
 
     - contents indicator
 
     - government publication indicator 
 
 These would all be type 1 attributes.  There is also a missing
 type 5 attribute:  embedded or internal truncation.
 
 
 1.3 Passing Information on Attributes Supported
 
 The question of how the origin system knows what attributes
 are supported by the target is not currently addressed by the
 standard.  It is therefore left to implementors to negotiate
 such matters externally, probably by bilateral agreement. 
 This would seem contrary to the whole purpose of OSI.
 
 While accounting and billing requirements mean that for now
 any use of SR will require prior bilateral agreement, the
 whole tenor of applications like the Directory is away from
 bilateral arrangements towards negotiation at connect time,
 which could surely be used to handle financial arrangements as
 well.  It is therefore important that SR should at least be
 capable of allowing an origin to meaningfully address queries
 to a target that it knows nothing about beforehand.  This
 requires that there should be some way for the origin to find
 out what search attributes the target supports, and in which
 combination.
 
 The draft application-context definition gives a minimum
 subset, but restricting all remote queries to this bare
 minimum of functionality would defeat much of the point of
 remote access (dial up with a manual in hand would accomplish
 more).  There would thus seem to be a requirement to be able
 to indicate support for more than the minimum subset.  It is,
 however, certain that very few, if any, IR systems will
 support all the attribute types in all the possible
 combinations.  There is therefore a requirement for the target
 system to let the origin system know that it supports more
 than the minimum but less than the full attribute set.
 
 How can this be achieved?  One possible way might be for the
 target system to let the origin system know as part of the
 InitResponse APDU which combinations of attributes are
 acceptable.  This need not be a substantial amount of data;
 the two systems will have already agreed on an attribute set
 (either by means of the default in the application context or
 by negotiation), so all the target system has to do is send a
 structure set of integers referring to the various
 combinations of attribute codes in the agree attribute set
 which it can support.  For instance, the target system might
 indicate that (using the ISO 10163 Bib-1 set) it supports
 (among others) the combination
 
     (1,1)+(2,3)+(4,2)+(5,1)+(6,3)
 
 meaning:  terms from a title (1,1) which equate (2,3) to words
 (4,2) taken from any position (3,3) in the complete field(s)
 (6,3) with right truncation (5,1).
 
 The target system could easily keep such a constant, and the
 origin could check each query it formulates against the set of
 legal attributes and take the appropriate action, such as
 asking the operator to reformulate the query or reformulating
 automatically on the basis of some kind of table (or the
 origin could simply ignore the information and risk getting
 lots of "Unsupported search" diagnostic records).
 The formal definition of this data could be implemented in
 ASN.1 by the following productions:
 
     SupportedSearches ::=SET OF LegalAttributeCombinations
 
     LegalAttributeCombinations ::=SET OF AttributeElement
 
     --each element in the set must be of a different type
     --AttributeElement is defined as part of Query in ISO     
       10163
 
 There is, however, an additional complication:  the protocol
 supports multiple database searching, and the set of
 LegalAttributeCombinations may well vary from database to
 database.  Therefore a separate set will need to be supplied
 for different groups of databases available for searching. 
 This could be a substantial amount of data, which it would be
 wasteful to send each time an origin logged on to a heavily
 used target.  It therefore seems sensible to require this
 information to be sent only in response to a request for it. 
 The INIT service is the appropriate place to both make the
 request and receive the information.  If it is desirable to
 avoid making any changes to the standard, the UserInformation
 field, defined as an EXTERNAL type, is available and can be
 used to transfer both the request and the response:
 
   UserInformationField ::=CHOICE{
        supplySearchInfo BOOLEAN DEFAULT FALSE,
             --this choice only to occur in InitializeRequest
               APDU
             SearchInfo
             --this choice only to occur in InitializeResponse
               APDU
                      }
 
 SearchInfo ::=SET OF DatabaseInfo
 
 DatabaseInfo ::SEQUENCE {
   SET OF DatabaseName,      --define in ISO 10163
          SupportedSearches
   }
 
 All that would be required to implement this in the existing
 standard is to register an ASN.1 module containing these
 productions with the appropriate body and include the module
 as a legitimate external object identifier for the User
 Information field in the Application-Context definition.
 
 1.4 End of session information
 
 The protocol provides no mechanism for transmitting end of
 session information, such as cost or statistics on which
 billing will be based.  There is therefore no way for the
 origin system to check its bills, or, if individual sessions
 are billed to different users, to bill them.  Is this
 important?  Can anything be done about it?
 
 1.5 Use with the ILL protocol
 
 One of the first uses of SR, in Canada at least, will be for
 verifying bibliographic data and determining locations for
 routing inter-library load requests.  Since the ILL protocol
 does not use MARC in its ItemId, extensive reformatting of
 records received via SR may be necessary.  It would be useful
 if Z.I.G. would agree to specify a standard ILL record
 transfer syntax, which could become an optional part of the SR
 profile.