Wed Apr  4 22:21:24 EDT 2007

A bug existed in deduping that could have resulted in
lost mail.  Namely: delivery process P1 probes, doesn't find message,
goes on to other stuff, TEMPFAILs before actually delivering.  Delivery
process P2, started later by MTA to deliver the same message, finds the
message recorded by P1's probe and so drops it.  That looks like success
to the MTA, so there'll be no further attempts and the message is lost.

The only solution is to separate probing the database and updating it,
such that the updating is only done after a successful delivery.
That also means tighter integration of deduping with the core module;
the probe must be done just before delivery, to avoid either keeping
the database locked for too long (in effect forcing total serialization
of deliveries and missing the opportunity for parallel deliveries to
different mailboxes), or OTOH opening a window for a race where
another delivery process sneaks in and delivers the mail after we've
probed but before we've delivered.

Note that formail -D has exactly the same problem.  I guess that's
an indication that nobody ever uses it.  If you do, please report
this to the procmail/formail maintainers, and consider switchng to
Mail::Sort <grin>

Exponential back-off added to locking delays.

Tue Apr  3 10:56:58 EDT 2007

DB_File seems to be broken on *BSD systems; it's probably tied
to the fact that libdb is part of libc rather than a stand-alone
library, and probably frozen at an ancient version.  As a workaround,
try to use GDBM instead on these systems.  Keep using DB on Linux
and Solaris; I refuse to throw away my data because of a bug.

Tue Mar 20 12:35:47 EDT 2007

The implementation of duplicate elimination (a la formail -D) has
changed radically.  Formerly, I read the Message-Id header just like
formail does, and used it as an index into a DB_File.  Unfortunately,
there are many mail programs and systems (including unixoids) that
create b0rk3d Message-Id headers, so I can't depend on it alone.

Instead I string together From, Date and Message-Id if present and in
that order.  One would like to use other bits of the message to further
enhance uniqueness and guard against collisions, for instance the
Subject and Lines headers (if present).  This fails because mailing
lists modify messages in ways that change these bits.

<WARNING><WARNING><WARNING>

This release is slightly incompatible with past ones.
Although the API remains the same, the semantics of header matches
has subtly changed.  They are now performed on the _original_ header
of the incoming message, not on a sanitized (unfolded) copy of it.
This is because it was very hard to come up with a clean way of
allowing modifications to the original header (which was always
the one to use for delivery) while keeping a separate copy for
matching.

Here's an example where the result is different.  Let's say the email
header contain the following lines:

Content-Type: text/plain;
 charset="iso-8859-1"

and assume it got slurped into $sort.  With past versions,

$sort->header_match('content-type', '; charset')

returned true; now it returns false.  This, however, still returns true

$sort->header_match('content-type', ';\s+charset')

and so does this

$sort->header_match('content-type', ';\n charset')

Also, header_match and all methods based on it now return a list of
header _indices_ rather than the actual header text.  Most of the time
the returned list list will have exactly one element, but may have more
for repeated headers like Received.  The actual text of the matched header
lines can be recovered with the new method get_header().  For instance,

my @msg = (
"Received: from foo\n",
"To: myself\n",
"Received: from bar\n",
"\n",
"Boo!\n"
);

my $sort = Mail::Sort->new (@msg);
my @indices = $sort->header_match ('Received'); # now @indices is (0, 2)
my @lines = $sort->get_header (@indices); 
# now @lines is ("Received: from foo\n", "Received: from bar\n")

</WARNING></WARNING></WARNING>


Mail::Sort is yet another module intended to enable the writing of
mail filters in the style of procmail(1). It was written when I
realized that the previous entries (Mail::Audit(3) and
Mail::Procmail(3)), while trying to emphasize elegance and brevity,
sacrificed a good deal of procmail's flexibility and power.

First and foremost, both of these existing modules only allow for
matching a single header at a time, throwing away procmail's TO_
feature (or any equivalent way of using a non-trivial regexp to match
a header tag). This really hurts when you try to match mailing lists
and spam list headers. Second, while I sympathize with the choice of
procedural interface with global variables in Mail::Procmail (way too
much of Perl code out there uses object-oriented syntax for no good
reason whatsoever), in the present case it means that the message
cannot be modified and then fed into another instance of the filter,
again a limitation in comparison to procmail.

Mail::Sort is meant to end the history of this particular area by
matching procmail's features 1-1.

For distribution terms, read the file COPYING.
