package Lingua::PT::Abbrev;

use warnings;
use strict;

=head1 NAME

Lingua::PT::Abbrev - An abbreviations dictionary manager for NLP

=head1 VERSION

Version 0.01

=cut

our $VERSION = '0.01';

=head1 SYNOPSIS

This module handles a built-in abbreviations dictionary, and a user
customized abbreviations dictionary. It provides handy functions for
NLP processing.

   use Lingua::PT::Abbrev;

   my $dic = Lingua::PT::Abbrev->new;

=head1 FUNCTIONS

=head2 new

This is the Lingua::PT::Abbrev dictionaries constructor. You don't
need to pass it any parameter, unless you want to maintain a personal
dictionary. In that case, pass the path to your personal dictionary
file.

The dictionary file is a text file, one abbreviation by line, as:

  sr senhor
  sra senhora
  dr doutor

=cut

sub new {
  my $class = shift;
  my $custom = shift || undef;
  my $self = bless { custom => $custom }, $class;

  $self->_load_dictionary;
  $self->_load_dictionary($custom) if ($custom);

  return $self;
}

sub _load_dictionary {
  my $self = shift;
  my $file = shift || undef;

  if ($file) {
    open C, $file or die;
    while(<C>) {
      chomp;
      ($a,$b) = split /\s+/, lc;
      $self->{dic}{$a} = $b;
    }
    close C;
  } else {
    while(<DATA>) {
      chomp;
      ($a,$b) = split /\s+/, lc;
      $self->{dic}{$a} = $b;
    }
  }
}

=head2 expand

Given an abbreviation, this method expands it. For expanding
abbreviations in a text use C<<text_expand>>, a lot faster.

Returns undef if the abbreviation is not known.

=cut

sub expand {
  my $self = shift;
  my $abbrev = lc(shift);
  $abbrev =~ s!\.$!!;
  if (exists($self->{dic}{$abbrev})) {
    return $self->{dic}{$abbrev}
  } else {
    return undef
  }
}

=head2 text_expand

Given a text, this method expands all known abbreviations

=cut

sub text_expand {
  my $self = shift;
  my $text = shift;

  $text =~ s{((\w+)\.)}{
    exists($self->{dic}{lc($2)})?$self->{dic}{lc($2)}:$1
  }eg;

  return $text;
}

=head1 AUTHOR

Alberto Simes, C<< <ambs@cpan.org> >>

=head1 BUGS

Please report any bugs or feature requests to
C<bug-lingua-pt-abbrev@rt.cpan.org>, or through the web interface at
L<http://rt.cpan.org>.  I will be notified, and then you'll automatically
be notified of progress on your bug as I make changes.

=head1 ACKNOWLEDGEMENTS

=head1 COPYRIGHT & LICENSE

Copyright 2004 Alberto Simes, All Rights Reserved.

This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

=cut

1; # End of Lingua::PT::Abbrev

__DATA__
dr doutor
dra doutora
drs doutores
dras doutoras
etc etc.
prof professor
profa professora
profs professores
profas professoras
sc sculo
av avenida
sr senhor
sra senhora
