class: romanization
%%explain.en:

These functions convert Japanese letters to and from romanized forms.

%%

name: kana2romaji
%%eg:

$romaji = kana2romaji ("うれしいこども");

%%
out: $romaji
expect: uresîkodomo
%%desc.en:

Convert kana to a romanized form.

An optional second argument, a hash reference, controls the style of
conversion.

    use utf8;
    $romaji = kana2romaji ("しんぶん", {style => "hepburn"});
    # $romaji = "shimbun"

The possible options are

=over

=item style

The style of romanization. The default form of romanization is
"Nippon-shiki". See
L<http://www.sljfaq.org/afaq/nippon-shiki.html>. The user can set the
conversion style to "hepburn" or "passport" or "kunrei". See
L<http://www.sljfaq.org/afaq/kana-roman.html>.

=item use_m

If this is set to any "true" value, syllabic I<n>s (ん) which come
before "b" or "p" sounds, such as the first "n" in "shinbun" (しんぶん,
newspaper) will be converted into "m" rather than "n".

=item ve_type

C<ve_type> controls how long vowels are written. The default is to use
circumflexes to represent long vowels. If you set "ve_type" =>
"macron", then it uses macrons (the Hepburn system). If you set
C<< "ve_type" => "passport" >>, then it uses "oh" to write long "o"
vowels. If you set C<< "ve_type" => "none" >>, then it does not use "h".

=back

%%
%%desc.ja:

仮名をローマ字に変換。

オプションは関数の２番目のハシュリファレンスで入ります。

    use utf8;
    $romaji = kana2romaji ("しんぶん", {style => "hepburn"});
    # $romaji = "shimbun"

可能なオプションは

=over

=item style

ローマ字の種類。

=over

=item undef

ディフォルトは日本式（「つづり」が「tuduri」, 「少女」が「syôzyo」）。

=item passport

パスポート式(「伊藤」が「itoh」)

=item kunrei

訓令式（少学校４年で習うローマ字）

=item hepburn

ヘボン式（「つづり」が「tsuzuri」, 「少女」が「shōjo」）。

=back

=item use_m

真なら「しんぶん」が「shimbun」

=item ve_type

長い母音はどの様に表現する。

=over

=item undef

曲折アクセントを使う。

=item macro

マクロンを使う。

=item passport

「アー」、「イー」、「ウー」、「エー」が「a」, 「i」, 「u」, 「e」になり、「オー」が「oh」になる。

=item none

「アー」、「イー」、「ウー」、「エー」ガ「a」, 「i」, 「u」, 「e」, 「o」になる。

=item wapuro

「アー」、「イー」、「ウー」、「エー」ガ「a-」, 「i-」, 「u-」, 「e-」,
「o-」になる。「おう」が「ou」など、仮名の長音を仮名で代表するよう、ロー
マ字入力のようなことです。

=back

=item wapuro

ワープロローマ字。長音符は使わない。「少女」が「shoujo」など。

=back

%%

name:  romaji2kana
%%eg:


$kana = romaji2kana ('yamaguti');

%%
out: $kana
expect: ヤマグチ
%%desc.en:


Convert romanized Japanese to kana. The romanization is highly liberal
and will attempt to convert any romanization it sees into kana.  To
convert romanized Japanese into hiragana, use L</romaji2hiragana>.

The second argument to the function contains options in the
form of a hash reference,

     $kana = romaji2kana ($romaji, {wapuro => 1});

Use an option C<< wapuro => 1 >> to convert long vowels into the
equivalent kana rather than I<chouon>.

%%

name:  romaji2hiragana
%%eg:

$hiragana = romaji2hiragana ('babubo');

%%
out: $hiragana
expect: ばぶぼ
%%desc.en:

Convert romanized Japanese into hiragana. This takes the same options
as L</romaji2kana>. It also switches on the "wapuro" option which makes
the use of long vowels with a kana rather than a chouon (long vowel
marker).

%%

name:  romaji_styles
%%eg:


my @styles = romaji_styles ();
# Returns a true value
romaji_styles ("hepburn");
# Returns the undefined value
romaji_styles ("frogs");

%%
%%desc.en:

Given an argument, return whether it is a legitimate style of romanization.

Without an argument, return a list of possible styles, as an array of
hash values, with each hash element containing "abbrev" as a short
name and "full_name" for the full name of the style.

%%

name:  is_voiced
%%eg:


if (is_voiced ('が')) {
     print "が is voiced.\n";
}

%%
%%desc.en:

Given a kana or romaji input, C<is_voiced> returns a true value if the
sound is a voiced sound like I<a>, I<za>, I<ga>, etc. and the
undefined value if not.

%%
%%desc.ja:

仮名かローマ字は濁音、半濁音がついていれば、真、ついていなければ偽です。

%%

name:  is_romaji
%%eg:


# The following line returns "undef"
is_romaji ("abcdefg");
# The following line returns a defined value
is_romaji ("atarimae");

%%
%%desc.en:

Detect whether a string of alphabetical characters, which may also
include characters with macrons or circumflexes, "looks like"
romanized Japanese. If the test is successful, returns the romaji in a
canonical form.

This functions by converting the string to kana and seeing if it
converts cleanly or not.

%%
%%desc.ja:

アルファベットの列はローマ字に見えるなら真、見えないなら偽。

%%

name:  normalize_romaji
%%eg:

$normalized = normalize_romaji ('tsumuji');

%%
%%desc.en:

C<normalize_romaji> converts romanized Japanese to a canonical form,
which is based on the Nippon-shiki romanization, but without
representing long vowels using a circumflex. In the canonical form,
sokuon (っ) characters are converted into the string "xtu".

If there is kana in the input string, this will also be converted to
romaji.

C<normalize_romaji> is for comparing two Japanese words which may be
represented in different ways, for example in different romanization
systems, to see if they refer to the same word despite the difference
in writing. It does not provide a standardized or
officially-sanctioned form of romanization.

%%

class: kana

name:  hira2kata
%%eg:


$katakana = hira2kata ('ひらがな');

%%
out: $katakana
expect: ヒラガナ
%%desc.en:

C<hira2kata> converts hiragana into katakana. If the input is a list,
it converts each element of the list, and if required, returns a list
of the converted inputs, otherwise it returns a concatenation of the
strings.

    my @katakana = hira2kata (@hiragana);

This does not convert chouon signs.

%%
%%desc.ja:

平仮名をかたかなに変換します。長音符は変換しません。

%%

name:  kata2hira
%%eg:


$hiragana = kata2hira ('カキクケコ');

%%
out: $hiragana
expect:  かきくけこ
%%desc.en:

C<kata2hira> converts full-width katakana into hiragana. If the input
is a list, it converts each element of the list, and if required,
returns a list of the converted inputs, otherwise it returns a
concatenation of the strings.

    my @hiragana = hira2kata (@katakana);

This function does not convert chouon signs into long vowels. It also
does not convert half-width katakana into hiragana.

%%
%%desc.ja:

かたかなを平仮名に変換します。長音符は変換しません。

%%

name:  InHankakuKatakana
%%eg:

use utf8;
if ('ｱ' =~ /\p{InHankakuKatakana}/) {
    print "ｱ is half-width katakana\n";
}

%%
%%desc.en:

C<InHankakuKatakana> is a character class for use in regular
expressions with C<\p> which can validate halfwidth katakana.

%%

name:  kana2hw
out: $half_width
expect: ｱｲｳｶｷｷﾞｮｳ｡
%%eg:


$half_width = kana2hw ('あいウカキぎょう。');

%%
%%desc.en:

C<kana2hw> converts hiragana, katakana, and fullwidth Japanese
punctuation to halfwidth katakana and halfwidth punctuation. Its
function is similar to the Emacs command C<japanese-hankaku-region>.
For the opposite function,
see L<hw2katakana>.

%%
%%desc.ja:

あらゆる仮名文字を半角カタカナに変換する。

%%

name:  hw2katakana
out: $full_width
expect: アイウカキギョウ。
%%eg:


$full_width = hw2katakana ('ｱｲｳｶｷｷﾞｮｳ｡');

%%
%%desc.en:

C<hw2katakana> converts halfwidth katakana and Japanese punctuation to
fullwidth katakana and punctuation. Its function is similar to the
Emacs command C<japanese-zenkaku-region>. For the opposite function,
see L<kana2hw>.

%%
%%desc.ja:

半角カタカナを全角カタカナに変換する。

%%

name:  is_kana
%%eg:



%%
%%desc.en:

This function returns a true value if its argument is a string of
kana, or an undefined value if not. The input cannot contain
punctuation or the long vowel symbol (chouonpu).

%%
%%desc.ja:

入力が仮名のみの場合、真、入力が仮名なでない文字を含む場合、偽(undef)。

%%

name:  is_hiragana
%%eg:

%%
%%desc.en:

This function returns a true value if its argument is a string of
hiragana, and an undefined value if not. The entire string from
beginning to end must all be kana for this to return true. The kana
cannot include punctuation marks or the long vowel symbol (chouonpu).

%%
%%desc.ja:

入力が平仮名のみの場合、真、入力が平仮名なでない文字を含む場合、偽(undef)。

%%


name:  kana2katakana
%%eg:

%%
%%desc.en:

Convert any of katakana, halfwidth katakana, circled katakana and
hiragana to full width katakana.

%%

class: wide
%%explain.en:

Almost every website in Japan requires users to input numbers and letters using "half width" characters. Use these functions and that is not necessary.

%%
%%explain.ja:

日本のホームページなら、「半角英数字」にこだわります。下記の関数をお使
いの場合、そんな必要性はありません。

%%

name:  InWideAscii
%%eg:


use utf8;
if ('Ａ' =~ /\p{InWideAscii}/) {
    print "Ａ is wide ascii\n";
}

%%
%%desc.en:

This is a character class for use with \p which matches a "wide ascii"
(全角英数字).

%%
%%desc.ja:

正規表現に使う全角英数字にマッチする。

%%

name:  wide2ascii
out: $ascii
expect: abCE019
%%eg:


$ascii = wide2ascii ('ａｂＣＥ０１９');

%%
%%desc.en:

Convert the "wide ASCII" used in Japan (fullwidth ASCII, 全角英数字)
into usual ASCII symbols (半角英数字).

%%
%%desc.ja:

全角英数字を半角英数字(ASCII)に変換する。

%%

name:  ascii2wide
out: $wide
expect: ａｂＣＥ０１９
%%eg:

$wide = ascii2wide ('abCE019');

%%
%%desc.en:

Convert usual ASCII symbols (半角英数字) into the "wide ASCII" used in
Japan (fullwidth ASCII, 全角英数字).


%%
%%desc.ja:

半角英数字(ASCII)を全角英数字に変換する。

%%


class: other

name:  kana2morse
%%eg:

$morse = kana2morse ('しょっちゅう');

%%
out: $morse
expect: --.-. -- .--. ..-. -..-- ..-
%%desc.en:

Convert Japanese kana into Morse code. Note that Japanese morse code
does not have any way of representing small kana characters, so
converting to and then from morse code will result in しょっちゅう
becoming シヨツチユウ.

%%

name:  morse2kana
%%eg:

$kana = morse2kana ('--.-. -- .--. ..-. -..-- ..-');

%%
out: $kana
expect: シヨツチユウ
%%desc.en:

Convert Japanese Morse code into kana. Each Morse code element must be separated by whitespace from the next one. 

%%
%%bugs.en:

This has not been extensively tested.

%%


name:  kana2braille
%%eg:

%%
%%desc.en:

Converts kana into the equivalent Japanese braille (I<tenji>) forms.

%%
%%bugs.en:

This has not been extensively tested. This is not an adequate Japanese
braille convertor. Creating Japanese braille requires breaking
Japanese sentences up into individual words, but this does not attempt
to do that. People who are interested in building a Perl braille
convertor could start here.

%%
%%bugs.ja:

きちんとしたテストがありません。日本語を点字に変換することはわたちがきが必要ですがこの関数はそれをしません。

%%

name:  braille2kana
%%eg:


%%
%%desc.en:

Converts Japanese braille (I<tenji>) into the equivalent katakana.

%%

name:  kana2circled
out: $circled
expect: ㋐㋑㋒㋓㋔
%%eg:


$circled = kana2circled ('あいうえお');

%%
%%desc.en:

This function converts kana into the "circled katakana" of Unicode,
which have code points from 32D0 to 32FE. See also L</circled2kana>.

Note that there is no circled form of the ン kana, so this is left
untouched.

%%
%%desc.ja:

仮名を丸付けかたかなに変換します。丸付け「ン」がないので、ンはそのままとなります。
丸付け片假名はユーニコード32D0〜32FEにあります。

%%


name:  circled2kana
out: $kana
expect: アイウエオ
%%eg:

$kana = circled2kana ('㋐㋑㋒㋓㋔');

%%
%%desc.en:

This function converts the "circled katakana" of Unicode into
full-width katakana. See also L</kana2circled>.

%%

class: kanji

name: new2old_kanji
%%eg:

$old = new2old_kanji ('三国 連太郎');


%%
out: $old
expect: 三國 連太郎
%%desc.en:

Convert new-style (post-1949) kanji (Chinese characters) into old-style (pre-1949) kanji.

%%
%%desc.ja:

親字体を旧字体に変換する

%%
%%bugs.en:

The list of characters in this convertor may not contain every pair of
old/new kanji.

It will not correctly convert 弁 since this has three different
equivalents in the old system.

%%

name: old2new_kanji
%%eg:

$new = old2new_kanji ('櫻井');


%%
out: $new
expect: 桜井
%%desc.en:

Convert old-style (pre-1949) kanji (Chinese characters) into new-style
(post-1949) kanji.

%%
%%desc.ja:

旧字体を親字体に変換する

%%

class: cyrillization
%%explain.en:

This is an experimental cyrillization of kana based on the information
in a Wikipedia article,
L<http://en.wikipedia.org/wiki/Cyrillization_of_Japanese>. The module
author does not know anything about cyrillization of kana, so any
assistance in correcting this is very welcome.

%%


name: kana2cyrillic
%%eg:

$cyril = kana2cyrillic ('シンブン');

%%
out: $cyril
expect: симбун

name: cyrillic2katakana
%%eg:

$kana = cyrillic2katakana ('симбун');

%%
out: $kana
expect: シンブン


class: hangul

name: kana2hangul
%%eg:

$hangul = kana2hangul ('すごわざ');

%%
out: $hangul
expect: 스고와자
%%explain.en:

Convert kana into hangul (Korean letters). See also
L<Lingua::KO::Munja>.

%%
%%explain.ja:

かなをハングルに変換する。See also
L<Lingua::KO::Munja>.

%%
%%bugs.en:

=over

=item Doesn't deal with ん

=item May be incorrect

This is based on a list found on the internet at
L<http://kajiritate-no-hangul.com/kana.html>. There is currently no
proof of correctness.

=back

%%
