NAME
    cyrillic - Library for fast and easy cyrillic text manipulation

SYNOPSIS
     use cyrillic qw/866 win2dos convert locase upcase detect/;

     print convert( 866, 1251, $str );
     print convert( 'dos','win', \$str );
     print win2dos $str;

DESCRIPTION
    This module includes cyrillic string converting functions from one and
    to another charset, to upper and to lower case without locale switching.
    Also included single-byte charsets detection routine. It is easy to add
    new code pages. For this purpose it is necessary only to add appropriate
    string of a code page.

    Supported charsets: ibm866, koi8-r, cp855, windows-1251, MacWindows,
    iso_8859-5, unicode, utf8;

    If the first imported parameter - number of a code page, then locale
    will be switched to it.

FUNCTIONS
    * cset_factory - between charsets convertion function generator
    * case_factory - case convertion function generator
    * convert - between charsets convertor
    * upcase - convert to upper case
    * locase - convert to lower case
    * upfirst - convert first char to upper case
    * lofirst - convert first char to lower case
    * detect - detect codepage number
    * charset - returns charset name for codepage number
    At importing list also might be listed named convertors. For Ex.:

     use cyrillic qw/dos2win win2koi mac2dos ibm2dos/;

    NOTE! Specialisations (like win2dos, utf2win) call faster then convert.

    NOTE! Only convert function and they specialisation work with Unicode
    and UTF-8 strings. All others function work only with single-byte
    sharsets.

    Names for using in named charset convertors:

     dos ibm866       866
     koi koi8-r       20866
     ibm cp855        855
     win windows-1251 1251
     mac ms-cyrillic  10007
     iso iso-8859-5   28585
     uni Unicode
     utf UTF-8

    The following rules are correct for converting functions:

     VAR may be SCALAR or REF to SCALAR.
     If VAR is REF to SCALAR then SCALAR will be converted.
     If VAR is ommited then $_ operated.
     If function called to void context and VAR is not REF
     then result placed to $_.

CONVERSION METHODS
    cset_factory SRC_CP, DST_CP
    Generates between codepages convertor function, from SRC_CP to DST_CP,
    and returns reference to his.

    The converting Unicode or UTF-8 data requires presence of installed
    Unicode::String and Unicode::Map.

    case_factory CODEPAGE, [TO_UP], [ONLY_FIRST_LETTER]
    Generates case convertor function for single-byte CODEPAGE and returns
    reference to his.

    convert SRC_CP, DST_CP, [VAR]
    Convert VAR from SRC_CP codepage to DST_CP codepage and returns
    converted string. Internaly calls cset_factory.

    upcase CODEPAGE, [VAR]
    Convert VAR to uppercase using CODEPAGE table and returns converted
    string. Internaly calls case_factory.

    locase CODEPAGE, [VAR]
    Convert VAR to lowercase using CODEPAGE table and returns converted
    string. Internaly calls case_factory.

    upfirst CODEPAGE, [VAR]
    Convert first char of VAR to uppercase using CODEPAGE table and returns
    converted string. Internaly calls case_factory.

    lofirst CODEPAGE, [VAR]
    Convert first char of VAR to lowercase using CODEPAGE table and returns
    converted string. Internaly calls case_factory.

MAINTAINANCE METHODS
    charset CODEPAGE
    Returns charset name for CODEPAGE.

    detect ARRAY
    Detect single-byte codepage of data in ARRAY and returns codepage
    number. If first element of ARRAY is REF to array of codepages numbers,
    then detecting will made between these codepages, otherwise - between
    all single-byte codepages. If codepage not detected then returns
    undefined value;

EXAMPLES
     use cyrillic qw/convert locase upcase detect dos2win win2dos/;

     $_ = "\x8F\xE0\xA8\xA2\xA5\xE2 \xF0\xA6\x88\xAA\x88!";

     printf "    dos: '%s'\n", $_;
     upcase 866;
     printf " upcase: '%s'\n", $_;
     dos2win;
     printf "dos2win: '%s'\n", $_;
     win2dos;
     printf "win2dos: '%s'\n", $_;
     locase 866;
     printf " locase: '%s'\n", $_;
     printf " detect: '%s'\n", detect $_;

     # detect between 866 and 20866 codepages
     printf " detect: '%s'\n", detect [866, 20866], $_;

     # CONVERTING TEST:

     use cyrillic qw/utf2dos mac2utf dos2mac win2dos utf2win/;

     $_ = "Хелло Ворльд!\n";

     print "UTF-8: $_";
     print "  DOS: ", utf2dos mac2utf dos2mac win2dos utf2win $_;

     # EQVIVALENT CALLS:

     dos2win( $str );        # called to void context -> result placed to $_
     $_ = dos2win( $str );

     dos2win( \$str );       # called with REF to string -> direct converting
     $str = dos2win( $str );

     dos2win();              # with ommited param called -> $_ converted
     dos2win( \$_ );
     $_ = dos2win( $_ );

     my $convert = cset_factory 866, 1251;
     &$convert( $str );            # faster call convertor function via ref to his
       convert( 866, 1251, $str ); # slower call convertor function

     # FOR EASY SWITCH LOCALE CODEPAGE

     use cyrillic qw/866/;   # locale switched to Russian_Russia.866

     use locale;
     print $str =~ /(\w+)/;

     no locale;
     print $str =~ /(\w+)/;

FAQ
     * Q: Why module say: Can't create Unicode::Map for 'koi8-r' charset!
       A: Your Unicode::Map module can't find map file for 'koi8-r' charset.
          Copy file koi8-r.map to site/lib/Unicode/Map and add to file
          site/lib/Unicode/Map/registry followings three strings:

          name:    KOI8-R
          map:     $UnicodeMappings/koi8-r.map
          alias:   csKOI8R

     * Q: Why perl say: "Undefined subroutine koi2win called" ?
       A: The function B<koi2win> is specialization of the function B<convert>,
          which is created at inclusion it of the name in the list of import.

AUTHOR
    Albert MICHEEV <Albert@f80.n5049.z2.fidonet.org>

COPYRIGHT
    Copyright (C) 2000, Albert MICHEEV

    This module is free software; you can redistribute it or modify it under
    the same terms as Perl itself.

AVAILABILITY
    The latest version of this library is likely to be available from:

    http://www.perl.com/CPAN

SEE ALSO
    Unicode::String, Unicode::Map.

