Tue Apr  3 10:56:58 EDT 2007

DB_File seems to be broken on *BSD systems; it's probably tied
to the fact that libdb is part of libc rather than a stand-alone
library, and probably frozen at an ancient version.  As a workaround,
try to use GDBM instead on these systems.  Keep using DB on Linux
and Solaris; I refuse to throw away my data because of a bug.

Tue Mar 20 12:35:47 EDT 2007

The implementation of duplicate elimination (a la formail -D) has
changed radically.  Formerly, I read the Message-Id header just like
formail does, and used it as an index into a DB_File.  Unfortunately,
there are many mail programs and systems (including unixoids) that
create b0rk3d Message-Id headers, so I can't depend on it alone.

Instead I string together From, Date and Message-Id if present and in
that order.  One would like to use other bits of the message to further
enhance uniqueness and guard against collisions, for instance the
Subject and Lines headers (if present).  This fails because mailing
lists modify messages in ways that change these bits.

<WARNING><WARNING><WARNING>

This release is slightly incompatible with past ones.
Although the API remains the same, the semantics of header matches
has subtly changed.  They are now performed on the _original_ header
of the incoming message, not on a sanitized (unfolded) copy of it.
This is because it was very hard to come up with a clean way of
allowing modifications to the original header (which was always
the one to use for delivery) while keeping a separate copy for
matching.

Here's an example where the result is different.  Let's say the email
header contain the following lines:

Content-Type: text/plain;
 charset="iso-8859-1"

and assume it got slurped into $sort.  With past versions,

$sort->header_match('content-type', '; charset')

returned true; now it returns false.  This, however, still returns true

$sort->header_match('content-type', ';\s+charset')

and so does this

$sort->header_match('content-type', ';\n charset')

Also, header_match and all methods based on it now return a list of
header _indices_ rather than the actual header text.  Most of the time
the returned list list will have exactly one element, but may have more
for repeated headers like Received.  The actual text of the matched header
lines can be recovered with the new method get_header().  For instance,

my @msg = (
"Received: from foo\n",
"To: myself\n",
"Received: from bar\n",
"\n",
"Boo!\n"
);

my $sort = Mail::Sort->new (@msg);
my @indices = $sort->header_match ('Received'); # now @indices is (0, 2)
my @lines = $sort->get_header (@indices); 
# now @lines is ("Received: from foo\n", "Received: from bar\n")

</WARNING></WARNING></WARNING>


Mail::Sort is yet another module intended to enable the writing of
mail filters in the style of procmail(1). It was written when I
realized that the previous entries (Mail::Audit(3) and
Mail::Procmail(3)), while trying to emphasize elegance and brevity,
sacrificed a good deal of procmail's flexibility and power.

First and foremost, both of these existing modules only allow for
matching a single header at a time, throwing away procmail's TO_
feature (or any equivalent way of using a non-trivial regexp to match
a header tag). This really hurts when you try to match mailing lists
and spam list headers. Second, while I sympathize with the choice of
procedural interface with global variables in Mail::Procmail (way too
much of Perl code out there uses object-oriented syntax for no good
reason whatsoever), in the present case it means that the message
cannot be modified and then fed into another instance of the filter,
again a limitation in comparison to procmail.

Mail::Sort is meant to end the history of this particular area by
matching procmail's features 1-1.

For distribution terms, read the file COPYING.
