#!/usr/bin/perl

#
# dbfilepivot.pm
# Copyright (C) 2011 by John Heidemann <johnh@isi.edu>
# $Id: 628d1e4630cc337fef1056df75e3cef8b51b2ad4 $
#
# This program is distributed under terms of the GNU general
# public license, version 2.  See the file COPYING
# in $dblibdir for details.
#


=head1 NAME

dbfilepivot - pivot a table, converting multiple rows into single wide row

=head1 SYNOPSIS

dbfilepivot [-e empty] -k KeyField -p PivotField [-v ValueField]

=head1 DESCRIPTION

Pivot a table, converting multiple rows correponding to the 
same key into a single wide row.

In a normalized database, one might have data with a schema like
(id, attribute, value),
but sometimes it's more convenient to see the data with a schema like
(id, attribute1, attribute2).
This tool converts the first to the second.
Here the "id" is the key, the attribute is the "pivot",
and the value is, well, the optional "value".

An example is clearer.  A gradebook usually looks like:

    #fsdb name hw_1 hw_2 hw_3
    John       97  98  99
    Paul       -   80  82

but one could also represent it as

    #fsdb name hw score
    John       1  97
    John       2  98
    John       3  99
    Paul       2  80
    Paul       3  82

This tool converts the second form into the first, when used as

    dbfilepivot -k hw score

Here name is the I<key> column that indicates which rows belong
to the same entity,
hw is the I<pivot> column that will be indicate which column
in the output is relevant,
and score is the value that indicates what goes in the
output.

The pivot creates a new column C<key_tag1>, C<key_tag2>, etc.
for each tag, the contents of the key field in the input.
It then populates those new columns with the contents of the value field
in the input.

If no value column is specified, then values are either empty or 1.

Dbfilepivot makes two passes over its data
and so requires temporary disk space equal to the input size.

Dbfilepivot assumes all lines with the same key are adjacent
in the input source, like L<dbmapreduce(1)> with the F<-S> option.


=head1 OPTIONS

=over 4

=item B<-k> or B<--key> KeyField

specify which column is the key for grouping.
Required (no default).

=item B<-p> or B<--pivot> PivotField

specify which column is the key to indicate which column in the output
is relevant.
Required (no default).

=item B<-v> or B<--value> ValueField

Specify which column is the value in the output.
If none is given, 1 is used for the value.

=item B<-C S> or B<--element-separator S>

Specify the separator I<S> used to join the input's key column
with its contents.
(Defaults to a single underscore.)

=item B<-e E> or B<--empty E>

give value E as the value for empty (null) records

=item B<-S> or B<--pre-sorted>

Assume data is already grouped by tag.
Provided twice, it removes the validiation of this assertion.

=item B<-T TmpDir>

where to put tmp files.
Also uses environment variable TMPDIR, if -T is 
not specified.
Default is /tmp.

=back

=for comment
begin_standard_fsdb_options

This module also supports the standard fsdb options:

=over 4

=item B<-d>

Enable debugging output.

=item B<-i> or B<--input> InputSource

Read from InputSource, typically a file name, or C<-> for standard input,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

=item B<-o> or B<--output> OutputDestination

Write to OutputDestination, typically a file name, or C<-> for standard output,
or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

=item B<--autorun> or B<--noautorun>

By default, programs process automatically,
but Fsdb::Filter objects in Perl do not run until you invoke
the run() method.
The C<--(no)autorun> option controls that behavior within Perl.

=item B<--help>

Show help.

=item B<--man>

Show full manual.

=back

=for comment
end_standard_fsdb_options


=head1 SAMPLE USAGE

=head2 Input:

    #fsdb name hw_1 hw_2 hw_3
    John       97  98  99
    Paul       -   80  82

=head2 Command:

    cat data.fsdb | dbfilepivot -k name -p hw -v score

=head2 Output:

    #fsdb name hw score
    John       1  97
    John       2  98
    John       3  99
    Paul       2  80
    Paul       3  82

=head1 SEE ALSO

L<Fsdb(3)>.
L<dbcolmerge(1)>.
L<dbcolsplittorows(1)>.
L<dbcolsplittocols(1)>.


=cut


# WARNING: This code is derived from dbfilepivot.pm; that is the master copy.

use Fsdb::Filter::dbfilepivot;
my $f = new Fsdb::Filter::dbfilepivot(@ARGV);
$f->setup_run_finish;  # or could just --autorun
exit 0;


=head1 AUTHOR and COPYRIGHT

Copyright (C) 2011 by John Heidemann <johnh@isi.edu>

This program is distributed under terms of the GNU general
public license, version 2.  See the file COPYING
with the distribution for details.

=cut

1;
