NAME
    Text::Match::FastAlternatives - efficient search for many strings

SYNOPSIS
        use Text::Match::FastAlternatives;

        my $expletives = Text::Match::FastAlternatives->new(@naughty);
        while (my $line = <>) {
            print "Do you email your mother with that keyboard?\n"
                if $expletives->match($line);
        }

DESCRIPTION
    This module allows you to search for any of a list of substrings
    ("keys") in a larger string. It is particularly efficient when the set
    of keys is large.

    This efficiency comes at the cost of some flexibility: the keys may not
    contain any control characters or non-ASCII characters; and it cannot do
    case-insensitive matching. If you want case-insensitivity, you have to
    fold case yourself:

        my $expletives = Text::Match::FastAlternatives->new(
            map { lc } @naughty);
        while (my $line = <>) {
            print "Do you email your mother with that keyboard?\n"
                if $expletives->match(lc $line);
        }

    This module is designed as a drop-in replacement for Perl code of the
    following form:

        my $expletives_regex = join '|', map { quotemeta } @naughty;
        $expletives_regex = qr/$expletives_regex/;
        while (my $line = <>) {
            print "Do you email your mother with that keyboard?\n"
                if $line =~ $expletives_regex;
        }

    Text::Match::FastAlternatives can easily perform this test a hundred
    times faster than the equivalent regex, if you have enough keys. The
    more keys it searches for, the faster it gets compared to the regex.

    Modules like Regexp::Trie can build an optimised version of such a
    regex, designed to take advantage of the niceties of perl's regex
    engine. With a large number of keys, this module will substantially
    outperform even an optimised regex like that. In one real-world
    situation with 339 keys, Regexp::Trie produced a regex that ran 857%
    faster than the naive regex (according to Benchmark), but using
    Text::Match::FastAlternatives ran 18275% faster than the naive regex, or
    twenty times faster than Regexp::Trie's optimised regex.

METHODS
    Text::Match::FastAlternatives->new(@keys)
        Constructs a matcher that can efficiently search for all of the
        @keys in parallel. Throws an exception if any of the keys are
        undefined, or if any of them contain any control characters or
        non-ASCII characters.

    $matcher->match($target)
        Returns a boolean value indicating whether the $target string
        contains any of the keys in $matcher.

    $matcher->match_at($target, $pos)
        Returns a boolean value indicating whether the $target string
        contains any of the keys in $matcher at position $pos. Returns false
        (without emitting any warning) if $pos is larger than the length of
        $string.

    $matcher->exact_match($target)
        Returns a boolean value indicating whether the $target string is
        exactly equal to any of the keys in $matcher.

CAVEATS
  Subclassing
    Text::Match::FastAlternatives has a "DESTROY" method implemented in XS.
    If you write a subclass with its own destructor, you will need to invoke
    the base destructor, or you will leak memory.

  Perl 5.10
    Perl 5.10 will contain many enhancements to the regex engine, including
    built-in optimisations for regexes with many branches that contain only
    literal strings. I suspect that, for the cases where
    Text::Match::FastAlternatives is currently very fast, it will also be
    faster than Perl 5.10's regex engine. But I may be wrong; if you're
    using Perl 5.9.4 or newer, you'd be well advised to compare the
    available options on data sets you're likely to use in practice.

IMPLEMENTATION
    Text::Match::FastAlternatives manages to be so fast by using a trie
    internally. The time to find a match at a given position in the string
    (or determine that there is no match) is independent of the number of
    keys being sought; worst-case match time is linear in the length of the
    longest key. Since a match must be attempted at each position in the
    target string, total worst-case search time is O(mn) where m is the
    length of the target string and n is the length of the longest key.

    The "match_at" and "exact_match" methods only need to find a match at
    one position, so they have worst-case running time of O(min(n, m)).

SEE ALSO
    <http://en.wikipedia.org/wiki/Trie>, Regexp::Trie, Regexp::Optimizer,
    Regexp::Assemble, perl594delta.

AUTHOR
    Aaron Crane <arc@cpan.org>

COPYRIGHT
    Copyright 2006, 2007 Aaron Crane.

    This library is free software; you can redistribute it and/or modify it
    under the terms of the Artistic License, or (at your option) under the
    terms of the GNU General Public License version 2.
