NAME
    Text::SpeedyFx - tokenize/hash large amount of strings efficiently

VERSION
    version 0.003

SYNOPSIS
        use Data::Dumper;
        use Text::SpeedyFx;

        my $sfx = Text::SpeedyFx->new;

        my $words_bag = $sfx->hash('To be or not to be?');
        print Dumper $words_bag;
        #$VAR1 = {
        #          '1422534433' => '1',
        #          '4120516737' => '2',
        #          '1439817409' => '2',
        #          '3087870273' => '1'
        #        };

        my $feature_vector = $sfx->hash_fv("thats the question", 5);
        print Dumper $feature_vector;
        #$VAR1 = [
        #          '0',
        #          '1',
        #          '0',
        #          '1',
        #          '0'
        #        ];

DESCRIPTION
    XS implementation of a very fast combined parser/hasher which works well
    on a variety of *bag-of-word* problems.

    Original implementation
    <http://www.hpl.hp.com/techreports/2008/HPL-2008-91R1.pdf> is in Java
    and was adapted for a better Unicode compliance.

METHODS
  new([$seed])
    Initialize parser/hasher, optionally using a specified $seed (default:
    1).

  hash($string)
    Parses $string and returns a hash reference where keys are hashed tokens
    and values are respective count.

  hash_fv($string, $n)
    Parses $string and returns a feature vector with $n elements.

  hash_min($string)
    Parses $string and returns the hash with the lowest value.

REFERENCES
    *   Extremely Fast Text Feature Extraction for Classification and
        Indexing <http://www.hpl.hp.com/techreports/2008/HPL-2008-91R1.pdf>
        by George Forman <http://www.hpl.hp.com/personal/George_Forman/> and
        Evan Kirshenbaum <http://www.kirshenbaum.net/evan/index.htm>

AUTHOR
    Stanislaw Pusep <stas@sysd.org>

COPYRIGHT AND LICENSE
    This software is copyright (c) 2012 by Stanislaw Pusep.

    This is free software; you can redistribute it and/or modify it under
    the same terms as the Perl 5 programming language system itself.

