grepmail - search mailboxes for a particular email

Grepmail searches a normal, gzip'd, bzip'd, or tzip'd mailbox for a given
regular expression, and returns those emails that match it. Piped input is
allowed, and date and size restrictions are supported.

New in version 4.72:

- 20% speed improvement in the Perl mailbox parser (By terry jones)
- Fixed a number of potential bugs in command line processing and date
  processing. (By terry jones)
- Cleaned up return values and use of quotes in the code. (By terry jones)
- Fixed a bug in -X signature processing (By terry jones)
- Modified anonymize_mailbox to anonymize To: and Subject: in the header.
  (Thanks to terry jones for the idea.)
- Fixed a bug in FastReader where emails less than 255 characters in size
  would occasionally cause a core dump. (Thanks to terry jones
  for submitting a bug report and sample mailbox.)
- Made "big" test mailboxes 4 times bigger for more meaningful speed tests

This release should really by called the "Terry Jones Release". Not only
did he fix a number of bugs, I was surprised that he made a 1 line change
which resulted in a 20% speed improvement. As a result, the Perl mailbox
parser is now only about 5% slower than the C parser. (Who said interpreted
Perl can't be fast?)

NOTE: For emails without message ids, grepmail will use Digest::MD5 to
compute a hash based on the email header. If you don't have
Digest::MD5, grepmail will just use the header itself as the messsage
id. The Digest::MD5 checksum takes a little while to compute, but
saves a lot of space. Currently there is no easy way to choose space
over time. Let me know if this is a problem.


MODULE DEPENDENCIES

- Date::Parse: required if you want to search based on date (-d)
- Date::Manip: required if you want to search using complex date
  specifications (-d)
- Digest::MD5: not required, but can help grepmail use less memory if
  you are checking for unique emails (-u) and your emails don't have a
  Message-Id header
- Inline >0.41: required if you want to use the mailbox parser written
  in C (approximately 5% faster than default Perl parser).

The modules can be found here:

Date::Parse (in TimeDate): http://search.cpan.org/search?dist=TimeDate
Date::Manip:               http://search.cpan.org/search?dist=DateManip
Digest::MD5:               http://search.cpan.org/search?dist=Digest-MD5
Inline:                    http://search.cpan.org/search?dist=Inline

Alternatively, installation can be done automatically using the CPAN module:

  perl -MCPAN -e 'install Date::Parse'
  perl -MCPAN -e 'install Date::Manip'
  perl -MCPAN -e 'install Digest::MD5'
  perl -MCPAN -e 'install Inline'


INSTALLATION

=> On Non-Windows systems:

  % perl Makefile.PL
  % make
  % make test
  % make install

By default, "perl Makefile.PL" does an interactive installation. You can avoid
the question about installing Mail::Folder::FastReader by specifying either
"FASTREADER=0" or "FASTREADER=1". You can avoid the question about the
installation location by specifying either "PREFIX=/installation/path" (for
installation into a custom location), "INSTALLDIRS=site" (for installation
into site-specific Perl directories), or "INSTALLDIRS=perl" (for installation
into standard Perl directories).

If make test fails, please see the INSTALLATION PROBLEMS section below.

=> On Windows systems:

- Just copy "grepmail" to a place in your path. You may want to rename it
  "grepmail.pl" if you've associated .pl files with perl.exe.


CONFIGURATION

You may want to set your MAIL environment variable so that grepmail will know
the default location to search for mailboxes.

If you are terribly concerned about performance, you may want to modify the
value of the variable READ_CHUNK_SIZE located in the code. This variable
controls how much text is read from the mailbox at a time. If the value is set
to 0, the entire file is read into memory. (There is no user-visible option
for setting this value.) You may also want to hack the code to not use
Digest::MD5, thereby trading space for time.

If you frequently use the same set of flags, you may wish to alias "grepmail"
to "grepmail -flags" within your command interpreter (shell). See the
documentation for your shell for details on how to do this.


INSTALLATION PROBLEMS

If "make test" fails, run

  make testfunc

and see which test(s) are failing. Please email, to the address below, the
test##.stderr and test##.stdout files for the test, which are located in
t/results. Also email the output of running the test with the -D flag. e.g.:

  blib/script/grepmail library -D -d "before July 9 1998" t/mailarc-1.txt \
    > test##.debug

If you see errors about your timezone, and you are in an uncommon timezone, it
may be the case that Date::Manip does not support your timezone yet. Try this:

  perl -MDate::Manip -e 'print "TIMEZONE: ".&Date::Manip::Date_TimeZone."\n"'

If you get an error, contact the author of Date::Manip.

For other bugs, see the section REPORTING BUGS below.


DOCUMENTATION

Just "perldoc grepmail". After installation on Unix systems, you can also do
"man grepmail".


HOMEPAGE

Visit http://grepmail.sourceforge.net/ for the latest version, mailing lists,
discussion forums, CVS access, cool utilities, and more.


REPORTING BUGS

You can report bugs at http://sourceforge.net/bugs/?group_id=2207.  Please
attach the output of running grepmail with the -D switch. If the bug is
related to processing of a particular mailbox, try to trim the mailbox to the
smallest set of emails that still exhibit the problem.  Then use the
"anonymize_mailbox" program that comes with grepmail to remove any sensitive
information, and attach the mailbox to the bug report.


PRIMARY AUTHOR

Written by David Coppit (david@coppit.org, http://coppit.org/), with the
generous help of many kind people. See the file CHANGES for detailed
information.


LICENSE

This code is distributed under the GNU General Public License (GPL). See
http://www.opensource.org/gpl-license.html and http://www.opensource.org/.
