}
}
}
}
was a mail file around 6MB.  Note that once was with a CHUNK value of
wanted, and I didn't even know I had made them.
vie.
use vars qw(@ISA @EXPORT_OK @EXPORT $VERSION *sort1 *sort2 %fh);
use strict;
use IO::File;
use File::Basename;
use Exporter;
use Carp;
try decreasing it.  If you get a warning in C<_writeTemp>, from the call
to C<_getTemp>, you've probably hit your limit.
time, like sort(1); this might change).  The default for Y is 20,000; increase
that comes with most Unix boxes.  This was developed primarily because
than it did at first.
suite; fixed warning for redefined subs (sort1 and sort2).
sub sort_file {
sub sortFile {
sub _writeTemp {
sub _mergeFiles {
sub _getTemp {IO::File->new_tmpfile}
standard Windows port of perl5.004_02 (50).  Adjusted the default
sorts, and arbitrary sorts.
sort(1).
some perls (specifically, MacPerl) do not have access to potentially
software; you can redistribute it and/or modify it under the same terms as
same as when MacPerl has 100MB allocated, showing that the module is
perl5.004_02 for Win32 has a limit of 50 open files, so 40 is safe.  To
performance with large files on Unix than you will on Mac OS.  C'est la
package File::Sort;
of VM, while Mac OS systems cannot, unless you bump up the memory
no strict 'refs';
no longer supported.  Hopefully made the whole thing more robust and
newline).
name.
much RAM, it might need to be lowered.
many parts out into separate functions.
low-memory systems, or where (e.g.) the MacPerl binary is not allocated
lines to deal with at a time (as opposed to how much memory to deal with at a
line will be sorted on the nth FIELD (default FIELD is 0).  If sorting by
infinite amounts of memory (thus they cannot necessarily slurp in a text
in the fact that it exists.  But it seems much less subject to change now
improve performance increase the number, and if you are getting failures,
if it is available.  Still.
http://pudge.net/
for total number of temp files from 50 to 40 (leave room for other open
for better performance, decrease if you run out of memory.
files), changed docs.  (Mike Blazer, Gurusamy Sarathy)
file of several megabytes), nor does everyone have access to sort(1).
field, it is best if the last field in the line, if used for sorting, has
faster, while supporting more options for sorting, including delimited
equivalent:
doing its job properly.  :)
different default sort order than C<sort_file> does.
already sorted.  UNIQUE_ONLY, if true, only outputs unique lines, removing
allocation as done below.  So inevitably you will get much better
all others.
a native newline character will be added to it.
__END__
WARNING: This is probably going to be MUCH SLOWER than using sort(1)
WARNING Part Deux: This module is subject to change in every way, including
Vicki Brown E<lt>vlb@cfcl.comE<gt>,
Version 0.17 (30 December 1998)
Tom Phoneix E<lt>rootbeer@teleport.comE<gt>,
This will sort FILEIN to FILEOUT.  The FILEOUT can be the same as the
This time, FILEIN can be a filename or an reference to an array of filenames. 
There are two primary syntaxes:
That all having been noted, there are plans to have this module use sort(1)
Tests 3 and 4 failed because we hit the open file limit in the
Some cleanup; made it not subject to system file limitations; separated 
SORT_THING will still need R and N for reverse and numeric sorts.
SORT_THING is so you can pass in any arbitrary sort thing you want, where
Rich Morin E<lt>rdm@cfcl.comE<gt>,
Perl itself.
One year between releases was too long.  I made changes Miko O'Sullivan
Note that tests 2 and 3 cannot be performed on the given dataset when
Note that if FILEIN does not have a linebreak terminating the last line,
None!  :)  I plan on making CHUNK and FILE_LIMIT more intelligent somehow.
NOTE: `sort` calls the MPW sort tool here, which has a slightly
More cleanup; fixed special case of no linebreak on last line; wrote test 
Miko O'Sullivan E<lt>miko@idocs.comE<gt>,
Mike Blazer E<lt>blazer@mail.nevalink.ruE<gt>,
Matthias Neeracher E<lt>neeri@iis.ee.ethz.chE<gt>,
Made CHUNK default a lot larger, which improves performance.  On
MacPerl has a small amount of memory allocated (like 8MB).  But when
MacPerl has 8MB allocated, the results for tests 1 and 4 are about the
If given a DELIMITER (which will be passed through C<quotemeta>), then each
If MERGE_ONLY is true, then C<File::Sort> will assume the files on input are
I did make the default for CHUNK larger, though.
Here are some benchmarks that might be of interest (PowerBook G3/292 with
Gurusamy Sarathy E<lt>gsar@activestate.comE<gt>.
Gene Hsu E<lt>gene@moreinfo.comE<gt>,
Fixed up docs and did some more tests and benchmarks.
Fixed bug in C<_mergeFiles> that tried to C<open> a passed
First release.
File::Sort - Sort a file or merge sort multiple files
FILE_LIMIT is the system's limit to how many files can be opened at once. 
FILEIN, but it is required.  VERBOSE is off by default.  CHUNK is how many
Exports C<sort_file> on request.  C<sortFile> is no longer the function
DELIMITER at the end of the field (i.e., the field ends in DELIMITER, not
Copyright (c) 1998 Chris Nandor.  All rights reserved.  This program is free
Chris Nandor E<lt>pudge@pobox.comE<gt>
C<IO::File> object.
Brian L. Matthews E<lt>blm@halcyon.comE<gt>,
Andrew M. Langmead E<lt>aml@world.std.comE<gt>,
Also, I will have the module use sort(1) if it is available.
Also now use C<IO::File> to create temp files, so the TMPDIR option is
Added unique and merge-only options.
Added reverse and numeric sorting options.
A default value of 40 is given in the module.  The standard port of
@ISA = qw(Exporter);
@EXPORT_OK = qw(sort_file sortFile);
@EXPORT = ();
=over 4
=item v0.18 (31 January 1998)
=item v0.17 (30 December 1998)
=item v0.16 (24 December 1998)
=item v0.11 (04 January 1998)
=item v0.10 (03 January 1998)
=item v0.03 (23 December 1997)
=item v0.02 (19 December 1997)
=item v0.01 (18 December 1997)
=head1 VERSION
=head1 THANKS
=head1 SYNOPSIS
=head1 SEE ALSO
=head1 NAME
=head1 HISTORY
=head1 EXPORT
=head1 DESCRIPTION
=head1 BUGS
=head1 AUTHOR
=cut
=back
200,000 lines; Unix systems can get away with something like that because
160MB RAM, VM on, and 100MB allocated to the MacPerl app).  The file
$VERSION = '0.18';
$SORT is the token representing your $a and $b.  For instance, these are
  });
  });
  })
  use File::Sort qw(sort_file);
  use File::Sort qw(sort_file);
  use Benchmark;
  timethese(10,{
  sort_file({I=>'b', O=>'b.out'});
  sort_file({I=>'b', O=>'b.out', S=>'(split(/\\|/, $SORT))[1]'});
  sort_file({I=>'b', O=>'b.out', S=>'$SORT'});
  sort_file({I=>'b', O=>'b.out', D=>'|', IDX=>1});
  sort_file({
  sort_file({
  sort_file(FILEIN, FILEOUT [, VERBOSE, CHUNK]);
  sort_file('file1','file1_new',1,1000);
  Benchmark: timing 10 iterations of 1, 2, 3, 4...
  #!perl -w
  # {(split(/\|/, $a))[1] cmp (split(/\|/, $b))[1]}
  # {$a cmp $b}
    Y=>CHUNK, TF=>FILE_LIMIT, 
    V=>1, Y=>1000, TF=>50, M=>1, U=>1, R=>1, N=>1,
    S=>SORT_THING,
    R=>REVERSE, N=>NUMERIC,
    O=>'filex_new',
    M=>MERGE_ONLY, U=>UNIQUE_ONLY, 
    I=>[qw(file1_new file2_new)],
    I=>FILEIN, O=>FILEOUT, V=>VERBOSE, 
    D=>DELIMITER, F=>FIELD,
    4=>q+sort_file({I=>$ARGV[0],O=>"$ARGV[0].3"})+,
    3=>q+sort_file({I=>$ARGV[0],O=>"$ARGV[0].2",Y=>200000})+,
    2=>q+open(F,$ARGV[0]);open(F1,">$ARGV[0].4");@f=<F>;print F1 sort @f+,
    1=>q+`sort -o $ARGV[0].1 $ARGV[0]`+,
         4: 274 secs (274.58 usr  0.00 sys = 274.58 cpu)
         3: 195 secs (195.77 usr  0.00 sys = 195.77 cpu)
         2: 152 secs (152.43 usr  0.00 sys = 152.43 cpu)
         1: 185 secs (185.65 usr  0.00 sys = 185.65 cpu)























































































	} keys %oth;
	} elsif (!ref($_[0])) {
	} else {
	} else {
	}
	}
	}
	}
	}
	{
	while (keys %fh) {
	unless (ref($file)) {
	seek($temp,0,0);
	seek($file,0,0);
	return $temp;
	return $file;
	print $temp sort sort1 @{$lines};
	print "\nDone!\n\n" if $$opts{V};
	print "\nCreating sorted $file ...\n" if $$opts{V};
	print "  $temp\n" if $$opts{V};
	my($uniq, $first, $line, $o, %oth);
	my($opts, $fh, $file) = @_;
	my($basename, $count2, $lines, $opts) = @_;
	my(
	my $temp = _getTemp() or warn $!;
	local $\;
	if (!$_[0] && (!ref($_[0]) || !$_[1])) {
	if (!$$opts{M}) {
	die "Change sortFile to sort_file, please.  Thanks and sorry.  :)\n";
	close(_mergeFiles($opts, \@fh, $$opts{O}));
	);
	($a, $b, $count1, $count2) = (1, 1, 0, 0);
	%oth = map {($o++ => $_)} @$fh;
	%fh  = map {
	$$opts{Y}	||= 20000;
	$$opts{TF}	||= 40;
	$$opts{R}	= $$opts{R} ? 1 : 0;
	$$opts{N}	= $$opts{N} ? 1 : 0;
	$$lines[-1] .= "\n" if ($$lines[-1] !~ m|\n$|);
		} elsif ($$opts{S}) {
		} else {
		}
		}
		}
		}
		open($file, "+> $file\0") || croak("Can't open $file: $!");
		my($cmp, $aa, $bb, $fa, $fb) = ('cmp', '$a', '$b', '$fh{$a}', '$fh{$b}');
		my $fh = $oth{$first};
		my $fh = $oth{$_};
		local($^W) = 0;
		if ($$opts{U} && $uniq && $uniq ne $fh{$first}) {
		if ($$opts{D}) {
		foreach $filein (@{$$opts{I}}) {
		foreach $filein (@{$$opts{I}}) {
		defined($line=<$fh>) ? $fh{$first} = $line : delete $fh{$first};
		croak if $@;
		croak if $@;
		croak 'Usage: sort_file({I=>FILEIN, O=>FILEOUT, %otheroptions})'
		croak 'Usage: sort_file($filein, $fileout [, $verbose, $chunk])';
		*sort2 = eval("sub {$fa $cmp $fb}");
		*sort1 = eval("sub {$aa $cmp $bb}");
		($first) = (sort sort2 keys %fh);
		($filein, $$opts{O}, $$opts{V}, $$opts{Y}) = @_;
		($bb, $aa, $fb, $fa) = ($aa, $bb, $fa, $fb)	if ($$opts{R} == 1);
		($_ => scalar <$fh>);
		$opts = \%{$_[0]};
		$filein, $uniq, $basedir, $basename, %sort1, %sort2,
		$count1, $count2, @lines, $lines, $line, @fh, $first, $opts,
		$cmp = '<=>' if ($$opts{N} == 1);
		$$opts{I} = [(ref($$opts{I}) ? @{$$opts{I}} : $$opts{I})];
		$$opts{I} = [$filein];
			}
			}
			while (defined($line=<F>)) {
			push(@fh, $filein);
			print $file $fh{$first};
			print $file $fh{$first};
			print $fh{$first};
			print "Sorting file $filein ...\n" if $$opts{V};
			print "Creating temp files ...\n" if $$opts{V};
			open(F, "< $filein\0") or croak($!);
			open($filein, "< $filein\0") or croak($!);
			if (@lines) {
			if (!$$opts{O} || !@{$$opts{I}});
			close(F);
			($fb = $$opts{S}) =~ s/\$SORT/\$fh{\$b}/g;
			($fa = $$opts{S}) =~ s/\$SORT/\$fh{\$a}/g;
			($bb = $$opts{S}) =~ s/\$SORT/\$b/g;
			($basename, $basedir) = fileparse($filein);
			($aa, $bb, $fa, $fb) = map "(split(/$$opts{D}/, $_))[$$opts{F}]",
			($aa = $$opts{S}) =~ s/\$SORT/\$a/g;
			$uniq = $fh{$first};
			$$opts{F} ||= 0;
			$$opts{D} = quotemeta($$opts{D});
				}
				push(@lines, $line);
				push(@fh, $fh);
				my $fh = _writeTemp($basename, $count2, \@lines, $opts);
				if ($count1 >= $$opts{Y}) {
				($count1, $count2, @lines) = (0, ++$count2);
				($aa, $bb, $fa, $fb);
				$count1++;
					}
					push(@fh, _writeTemp($basename, $count2, \@lines, $opts));
					if ($count2 >= $$opts{TF}) {
					($count1, $count2, @lines) = (0, ++$count2);
						print "\nCreating temp files ...\n" if $$opts{V};
						@fh = (_mergeFiles($opts, \@fh, _getTemp()));
						$count2 = 0;
