README(1)             User Contributed Perl Documentation            README(1)




       UMLS::Similarity

   SSYYNNOOPPSSIISS
         This package consists of Perl modules along with supporting Perl
         programs that implement the semantic relatedness measures described
         by Leacock & Chodorow (1998), Wu & Palmer (1994), Nguyen and Al-Mubaid
         (2006), Rada, et. al. 1989, Patwardhan (2003) and a simple path based
         measure. In the near future, we are planning to add  Jiang & Conrath
         (1997), Resnik (1995) and Lin (1998).

         UMLS::Similarity requires the UMLS::Interface module to access
         the Unified Medical Language System (UMLS) in order to determine
         the similarity between two UMLS concepts.

         The Perl modules are designed as objects with methods that take as
         input two concepts from the UMLS. The semantic relatedness of these
         concepts is returned by these methods. A quantitative measure of
         the degree to which the two concepts are related has wide ranging
         applications in numerous areas, such as word sense disambiguation,
         information retrieval, etc. For example, in order to determine which
         sense of a given word is being used in a particular context, the sense
         having the highest relatedness with its context word senses is most
         likely to be the sense being used. Similarly, in information retrieval,
         retrieving documents containing highly related concepts are more likely
         to have higher precision and recall values.

         The following sections describe the organization of this software
         package and how to use it. A few typical examples are given to help
         clearly understand the usage of the modules and the supporting
         utilities.

   SSEEMMAANNTTIICC RREELLAATTEEDDNNEESSSS
           We observe that humans find it extremely easy to say if two words are
           related and if one word is more related to a given word than another.
           For example, if we come across two words -- 'car' and 'bicycle', we know
           they are related as both are means of transport. Also, we easily observe
           that 'bicycle' is more related to 'car' than 'fork' is. But is there
           some way to assign a quantitative value to this relatedness? Some ideas
           have been put forth by researchers to quantify the concept of
           relatedness of words, with encouraging results.

           A number of different measures of relatedness have been implemented in
           this software package. These include a simple edge counting
           approach. The measures require the UMLS-Interface that define UMLS
           concepts, and some basic relationships between these concepts.

   CCOONNTTEENNTTSS
           All the modules that will be installed in the Perl system directory are
           present in the '/lib' directory tree of the package. These include the
           semantic relatedness modules --

             UMLS/Similarity/lch.pm
             UMLS/Similarity/path.pm
             UMLS/Similarity/wup.pm
             UMLS/Similarity/nam.pm
             UMLS/Similarity/cdist.pm
             UMLS/Similarity/vector.pm (beta)

           -- present in the lib/ subdirectory. All these modules, once installed
           in the Perl system directory, can be directly used by Perl programs.

           The package contains a utils/ directory that contain Perl utility
           programs. These utilities use the modules or provide some supporting
           functionality.

             umls-similarity.pl -- returns the semantic similarity of two
                                   terms or UMLS CUIs given a specified
                                   measure (and view of the UMLS).

   IINNSSTTAALLLL
           To install these modules run:

             perl Makefile.PL
             make
             make test
             make install

           This will install the modules in the standard locations. You will, most
           probably, require root privileges to install in standard system
           directories. To install in a non-standard directory, specify a prefix
           during the 'perl Makefile.PL' stage as:

             perl Makefile.PL PREFIX=/home

           It is possible to modify other parameters during installation. The
           details of these can be found in the ExtUtils::MakeMaker
           documentation. However, it is highly recommended not messing around
           with other parameters, unless you know what you're doing.

   SSOOFFTTWWAARREE CCOOPPYYRRIIGGHHTT AANNDD LLIICCEENNSSEE
           Copyright (C) 2004-2009 Bridget T McInnes, Siddharth Patwardhan,
           Serguei Pakhomov and Ted Pedersen

           This suite of programs is free software; you can redistribute it and/or
           modify it under the terms of the GNU General Public License as published
           by the Free Software Foundation; either version 2 of the License, or (at
           your option) any later version.

           This program is distributed in the hope that it will be useful, but
           WITHOUT ANY WARRANTY; without even the implied warranty of
           MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
           General Public License for more details.

           You should have received a copy of the GNU General Public License along
           with this program; if not, write to the Free Software Foundation, Inc.,
           59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

           Note: The text of the GNU General Public License is provided in the file
           'GPL.txt' that you should have received with this distribution.

   RREEFFEERREENNCCEESS
           1   Wu Z. and Palmer M. 1994. Verb Semantics and Lexical Selection. In
               Proceedings of the 32nd Annual Meeting of the Association for
               Computational Linguistics.  Las Cruces, New Mexico.

           2   Resnik P. 1995. Using information content to evaluate semantic
               similarity. In Proceedings of the 14th International Joint
               Conference on Artificial Intelligence, pages 448-453, Montreal.

           3   Jiang J. and Conrath D. 1997. Semantic similarity based on corpus
               statistics and lexical taxonomy. In Proceedings of International
               Conference on Research in Computational Linguistics, Taiwan.

           4   Fellbaum C., editor. WordNet: An electronic lexical database. MIT
               Press, 1998.

           5   Leacock C. and Chodorow M. 1998. Combining local context and WordNet
               similarity for word sense identification. In Fellbaum 1998, pp.
               265-283.

           6   Lin D. 1998. An information-theoretic definition of similarity. In
               Proceedings of the 15th International Conference on Machine
               Learning, Madison, WI.

           7   Hirst G. and St-Onge D. 1998. Lexical Chains as representations of
               context for the detection and correction of malapropisms. In
               Fellbaum 1998, pp. 305-332.

           8   Schuetze H. 1998. Automatic Word Sense Discrimination. Computational
               Linguistics, 24(1):97-123.

           9   Resnik P. 1999. Semantic Similarity in a Taxonomy: An Information-
               Based Measure and its Applications to Problems of Ambiguity in
               Natural Language. Journal of Artificial Intelligence Research, 11,
               95-130.

           10  Budanitsky A. and Hirst G. 2001. Semantic distance in WordNet: An
               experimental, application-oriented evaluation of five measures. In
               Workshop on WordNet and Other Lexical Resources, Second meeting of
               the North American Chapter of the Association for Computational
               Linguistics. Pittsburgh, PA.

           11  Banerjee S. and Pedersen T. 2002. An Adapted Lesk Algorithm for Word
               Sense Disambiguation Using WordNet. In Proceeding of the Fourth
               International Conference on Computational Linguistics and
               Intelligent Text Processing (CICLING-02). Mexico City.

           12  Patwardhan S., Banerjee S. and Pedersen T. 2002. Using Semantic
               Relatedness for Word Sense Disambiguation. In Proceedings of the
               Fourth International Conference on Intelligent Text Processing and
               Computational Linguistics, Mexico City.

           13  Banerjee S. Adapting the Lesk algorithm for word sense
               disambiguation to WordNet. Master Thesis, University of Minnesota,
               Duluth, 2002.

           14  Patwardhan S. Incorporating dictionary and corpus information into a
               vector measure of semantic relatedness. Master Thesis, University of
               Minnesota, Duluth, 2003.

           15  Rada, R., Mili, H., Bicknell, E. and Blettner, M. Development and
               application of a metric on semantic nets. In Proceedings of the
               IEEE Transactions on Systems, Man, and Cybernetics, volume 19,
               pages 17-30, 1989.

           16  Nguyen, H.A. and Al-Mubaid, H. New ontology based semantic
               similarity mesaure for the biomedical domain. In Proceedings of
               the IEEE International Conference on Granular Computing, pages
               623-628, 2006.

   SSEEEE AALLSSOO
       <http://search.cpan.org/dist/UMLS-Interface>

       <http://search.cpan.org/dist/UMLS-Similarity>

   CCOONNTTAACCTT UUSS
       If you have any trouble installing and using UMLS-Interface, please
       contact us via the users mailing list :

       umls-similarity@yahoogroups.com

       You can join this group by going to:

       <http://tech.groups.yahoo.com/group/umls-similarity/>

       You may also contact us directly if you prefer :

         Bridget T. McInnes: bthomson at cs.umn.edu
         Ted Pedersen      : tpederse at d.umn.edu

   AAUUTTHHOORRSS
        Bridget T McInnes, University of Minnesota Twin Cities
        bthomson at cs.umn.edu

        Siddharth Patwardhan, University of Utah
        sidd at cs.utah.edu

        Serguei Pakhomov, University of Minnesota Twin Cities
        pakh002 at umn.edu

        Ted Pedersen, University of Minnesota Duluth
        tpederse at d.umn.edu

   DDOOCCUUMMEENNTTAATTIIOONN CCOOPPYYRRIIGGHHTT AANNDD LLIICCEENNSSEE
       Copyright (C) 2003-2009 Bridget T. McInnes, Siddharth Patwardhan,
       Serguei Pakhomov and Ted Pedersen.

       Permission is granted to copy, distribute and/or modify this document
       under the terms of the GNU Free Documentation License, Version 1.2 or
       any later version published by the Free Software Foundation; with no
       Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

       Note: a copy of the GNU Free Documentation License is available on the
       web at:

       <http://www.gnu.org/copyleft/fdl.html>

       and is included in this distribution as FDL.txt.



perl v5.10.0                      2009-10-30                         README(1)
