
    UMLS::Similarity

  SYNOPSIS
      This package consists of Perl modules along with supporting Perl
      programs that implement the semantic relatedness measures described 
      by Leacock & Chodorow (1998), Wu & Palmer (1994) and a simple path 
      based measure. In the near future, we are planning to add  Jiang & 
      Conrath (1997), Resnik (1995) and Lin (1998).

      UMLS::Similarity requires the UMLS::Interface module to access 
      the Unified Medical Language System (UMLS) in order to determine 
      the similarity between two UMLS concepts.

      The Perl modules are designed as objects with methods that take as
      input two concepts from the UMLS. The semantic relatedness of these 
      concepts is returned by these methods. A quantitative measure of 
      the degree to which the two concepts are related has wide ranging 
      applications in numerous areas, such as word sense disambiguation, 
      information retrieval, etc. For example, in order to determine which 
      sense of a given word is being used in a particular context, the sense 
      having the highest relatedness with its context word senses is most 
      likely to be the sense being used. Similarly, in information retrieval, 
      retrieving documents containing highly related concepts are more likely 
      to have higher precision and recall values.

      The following sections describe the organization of this software
      package and how to use it. A few typical examples are given to help
      clearly understand the usage of the modules and the supporting
      utilities.

  SEMANTIC RELATEDNESS
        We observe that humans find it extremely easy to say if two words are
        related and if one word is more related to a given word than another.
        For example, if we come across two words -- 'car' and 'bicycle', we know
        they are related as both are means of transport. Also, we easily observe
        that 'bicycle' is more related to 'car' than 'fork' is. But is there
        some way to assign a quantitative value to this relatedness? Some ideas
        have been put forth by researchers to quantify the concept of
        relatedness of words, with encouraging results.

        A number of different measures of relatedness have been implemented in
        this software package. These include a simple edge counting
        approach. The measures require the UMLS-Interface that define UMLS 
        concepts, and some basic relationships between these concepts.

  CONTENTS
        All the modules that will be installed in the Perl system directory are
        present in the '/lib' directory tree of the package. These include the
        semantic relatedness modules -- 

          Semantic/Similarity/lch.pm
          Semantic/Similarity/path.pm
          Semantic/Similarity/wup.pm

        -- present in the lib/ subdirectory. All these modules, once installed
        in the Perl system directory, can be directly used by Perl programs.

        The package contains a utils/ directory that contain Perl utility 
        programs. These utilities use the modules or provide some supporting
        functionality.

          queryUMLS.pl -- returns the semantic similarity of two 
                          terms or UMLS CUIs given a specified 
                          measure (and view of the UMLS).
      
  INSTALL
        To install these modules run the following magic commands:

          perl Makefile.PL
          make
          make test
          make install

        This will install the modules in the standard locations. You will, most
        probably, require root privileges to install in standard system
        directories. To install in a non-standard directory, specify a prefix
        during the 'perl Makefile.PL' stage as:

          perl Makefile.PL PREFIX=/home/sid

        It is possible to modify other parameters during installation. The
        details of these can be found in the ExtUtils::MakeMaker
        documentation. However, it is highly recommended not messing around
        with other parameters, unless you know what you're doing.

  SOFTWARE COPYRIGHT AND LICENSE
        Copyright (C) 2004-2009 Bridget T McInnes,  Siddharth Patwardhan, 
        Serguei Pakhomov and Ted Pedersen

        This suite of programs is free software; you can redistribute it and/or
        modify it under the terms of the GNU General Public License as published
        by the Free Software Foundation; either version 2 of the License, or (at
        your option) any later version.

        This program is distributed in the hope that it will be useful, but
        WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
        General Public License for more details.

        You should have received a copy of the GNU General Public License along
        with this program; if not, write to the Free Software Foundation, Inc.,
        59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

        Note: The text of the GNU General Public License is provided in the file
        'GPL.txt' that you should have received with this distribution.

  REFERENCES
        1   Wu Z. and Palmer M. 1994. Verb Semantics and Lexical Selection. In
            Proceedings of the 32nd Annual Meeting of the Association for
            Computational Linguistics.  Las Cruces, New Mexico.

        2   Resnik P. 1995. Using information content to evaluate semantic
            similarity. In Proceedings of the 14th International Joint
            Conference on Artificial Intelligence, pages 448-453, Montreal.

        3   Jiang J. and Conrath D. 1997. Semantic similarity based on corpus
            statistics and lexical taxonomy. In Proceedings of International
            Conference on Research in Computational Linguistics, Taiwan.

        4   Fellbaum C., editor. WordNet: An electronic lexical database. MIT
            Press, 1998.

        5   Leacock C. and Chodorow M. 1998. Combining local context and WordNet
            similarity for word sense identification. In Fellbaum 1998, pp.
            265-283.

        6   Lin D. 1998. An information-theoretic definition of similarity. In
            Proceedings of the 15th International Conference on Machine
            Learning, Madison, WI.

        7   Hirst G. and St-Onge D. 1998. Lexical Chains as representations of
            context for the detection and correction of malapropisms. In
            Fellbaum 1998, pp. 305-332.

        8   Schtze H. 1998. Automatic Word Sense Discrimination. Computational
            Linguistics, 24(1):97-123.

        9   Resnik P. 1999. Semantic Similarity in a Taxonomy: An Information-
            Based Measure and its Applications to Problems of Ambiguity in
            Natural Language. Journal of Artificial Intelligence Research, 11,
            95-130.

        10  Budanitsky A. and Hirst G. 2001. Semantic distance in WordNet: An
            experimental, application-oriented evaluation of five measures. In
            Workshop on WordNet and Other Lexical Resources, Second meeting of
            the North American Chapter of the Association for Computational
            Linguistics. Pittsburgh, PA.

        11  Banerjee S. and Pedersen T. 2002. An Adapted Lesk Algorithm for Word
            Sense Disambiguation Using WordNet. In Proceeding of the Fourth
            International Conference on Computational Linguistics and
            Intelligent Text Processing (CICLING-02). Mexico City.

        12  Patwardhan S., Banerjee S. and Pedersen T. 2002. Using Semantic
            Relatedness for Word Sense Disambiguation. In Proceedings of the
            Fourth International Conference on Intelligent Text Processing and
            Computational Linguistics, Mexico City.

        13  Banerjee S. Adapting the Lesk algorithm for word sense
            disambiguation to WordNet. Master Thesis, University of Minnesota,
            Duluth, 2002.

        14  Patwardhan S. Incorporating dictionary and corpus information into a
            vector measure of semantic relatedness. Master Thesis, University of
            Minnesota, Duluth, 2003.

  SEE ALSO
        <http://search.cpan.org/dist/UMLS-Interface>,
        <http://search.cpan.org/dist/UMLS-Similarity>

  CONTACT US
      If you have any trouble installing and using UMLS-Interface, please 
      contact us via the users mailing list : 

        umls-similarity@yahoogroups.com

      You can join this group by going to:

        http://tech.groups.yahoo.com/group/umls-similarity/

      You may also contact us directly if you prefer :

        Bridget T. McInnes: bthomson at cs.umn.edu
        Ted Pedersen      : tpederse at d.umn.edu

  AUTHORS
         Bridget T McInnes, University of Minnesota Twin Cities
         bthomson at cs.umn.edu

         Siddharth Patwardhan, University of Utah
         sidd at cs.utah.edu

         Serguei Pakhomov, University of Minnesota Twin Cities
         pakh002 at umn.edu

         Ted Pedersen, University of Minnesota Duluth
         tpederse at d.umn.edu

  DOCUMENTATION COPYRIGHT AND LICENSE
        Copyright (C) 2003-2009 Bridget T. McInnes, Siddharth Patwardhan, 
        Serguei Pakhomov and Ted Pedersen.

        Permission is granted to copy, distribute and/or modify this document
        under the terms of the GNU Free Documentation License, Version 1.2 or
        any later version published by the Free Software Foundation; with no
        Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

        Note: a copy of the GNU Free Documentation License is available on the
        web at <http://www.gnu.org/copyleft/fdl.html> and is included in this
        distribution as FDL.txt.

