
FILE: italian.words
VERSION: DEC-SRC-92-Apr-05

EDITOR

    Jorge Stolfi <stolfi@src.dec.com>
    DEC Systems Research Center
  
AUTHOR OF ORIGINAL WORDLIST

    David Vincenzetti <vince@ghost.unimi.it>

DESCRIPTION

    The file italian.words is a list of over 60,000 Italian words.
    
    The file has one word per line, and is sorted with sort(1)
    in plain ASCII collating sequence.

    The file includes many verbal derivatives and declinations of
    nouns and adjectives.  However, the list is still highly
    incomplete and inconsistent. In particular, there are no verbal
    forms with postfix oblique pronouns ("arrivederci"), and
    practically no aumentatives ("omone", "omaccio"), diminutives
    ("casina", "casetta", "casella"), superlatives ("carissimo"),
    adverbial forms ("lentamente"), etc.

    The accent mark is encoded as a back-quote "`" after the letter,
    as in "cioe`" (but beware that many accents are missing).  The
    list also contains the elided variants of articles and
    demonstratives ("dell'", "quest'", "l'") , with the final
    apostrophe.  The file use only the letters [a-z], back-quote,
    apostrophe, and newline.

    The file also contains many errors, apparently due to incorrect
    handling of accented characters at some point in the past.  Some
    of the errors were fixed manually, but many still remain.

AUXILIARY LISTS

    In the same directory as italian.words you will find also:

    italian.trash

        A list of incorrect and foreign words removed from the original 
        wordlist.

ORIGINAL LISTS 

    The original wordlist from which those files were compiled is listed
    below.  It was obtained by anonymous FTP on 92-Feb-10.

    [1] from: relay.cs.toronto.edu: /pub/doc/Dictionaries
        file: words.italian.Z
        size: 217241 bytes (561981 bytes uncompressed)
        author: David Vincenzetti <vince@ghost.unimi.it>

    COMMENTS: The file words.italian [1] appears to be broken: all
    accented characters (not just the accents!) are missing or
    replaced by junk characters. For instance, "volera`"
    is replaced by "voler".  My guess is that the initial copy
    had accents encoded as ISO 8-bit characters, and it was at some
    point piped through some piece of software that did not support
    them.

COMPILATION PROCESS    

    The file italian.words is a slightly cleaned-up version of the
    list words.italian.Z [1].  I fixed some of the most obvious
    missing-accent bugs by hand, but there are still many left (and
    quite a few of my "corrections" are themselves incorrect).

(NON-)COPYRIGHT STATUS

  To the best of my knowledge, all the files I used to build these
  wordlists were available for public distribution and use, at least
  for non-commercial purposes.  I have confirmed this assumption with
  the authors of the lists, whenever they were known.
  
  Therefore, it is safe to assume that the wordlists in this package
  can also be freely copied, distributed, modified, and used for
  personal, educational, and research purposes.  (Use of these files in
  commercial products may require written permission from DEC and/or
  the authors of the original lists.)
  
  Whenever you distribute any of these wordlists, please distribute
  also the accompanying README file.  If you distribute a modified
  copy of one of these wordlists, please include the original README
  file with a note explaining your modifications.  Your users will
  surely appreciate that.

(NO-)WARRANTY DISCLAIMER

  These files, like the original wordlists on which they are based,
  are still very incomplete, uneven, and inconsitent, and probably
  contain many errors.  They are offered "as is" without any warranty
  of correctness or fitness for any particular purpose.  Neither I nor
  my employer can be held responsible for any losses or damages that
  may result from their use.

