package PDL::IO::BareStore;
our $VERSION = '0.01';
$VERSION = eval $VERSION;
use Carp;
use PDL qw(pdl);
# XXX TODO !!!
# XS version
#   madvise() ?
#   MAP_POPULATE
#   Direct transform from packed string to PDL? PDL guru wanted !
# Meta file?
my %PDL_TYPE = qw(
  C* byte
  c* byte
  s* short
  S* ushort
  l* int
  q* longlong
  f* float
  d* double
);

my %SizeOf = qw( ~~SIZES~~ );

sub new {
  my ($class, %opt) = @_;
  @opt{qw/x y/} = @{delete $opt{'dims'}};
  $opt{'blockSize'} = $opt{'x'} * $opt{'y'} * $SizeOf{$opt{'type'}};
  open $opt{'_F'}, '<:mmap', $opt{'file'} or croak "Error: $!";
  binmode($opt{'_F'});
  $opt{'readCount'} ||= 1;    # no one use this for reading zero times, right?
  my $self = \%opt;
  bless $self;
}

sub nextBlock {
  my ($self, $pdl) = @_;
  my ($F, $l, $x, $y, $t) = @{$self}{qw/_F blockSize x y type /};
  return 0 if eof($F);
  my $s;
  my $read = read($F,$s,$l);
  return -1 if ! defined $read; # error
  if ( eof($F) || $read == 0 ) {
    if ( --$self->{'readCount'} ) {
      seek $F, 0, 0;
    }
  }
  if ( $read != $self->{'blockSize'} ) {
    return -1 if $read % $x != 0;
    $y = $read / ( $x * $SizeOf{$self->{'type'}} );
  }
  $$pdl = zeroes((new PDL::Type($PDL_TYPE{$t})), $x, $y);
  ${$$pdl->get_dataref} = $s;
  $$pdl->upd_data;
  return 1;
}

sub DESTROY { close shift->{'_F'} }

1
__END__
=head1 NAME

PDL::IO::BareStore - Simple PDL extension for reading 2 dimensional "Big Data"

=head1 SYNOPSIS

  use PDL::IO::BareStore;

  # simple way to tarnsform CSV files into "database" file
  # perl -E 'say join ",", 1..14 for 1..1000' | perl -nF, -e 'chomp;print pack $p, @F}BEGIN{$p = shift' "C*" > ./output.rdb

  my $D = new PDL::IO::BareStore(
    file => 'output.rdb',   # file name
    dims => [14, 100],      # 14 record, 100 line each time
                            # Not quite handy, any suggestion?
    type => 'C*',           # How you packed the database
    readCount => 5,         # read file for five times
  );                        # only readCount is optional, you should specify those parameters

  while ( $D->nextBlock(\my $b) > 0 ) {   # sequentially read the file
    # $b is a piddle containing a subset of the data
    # do something with $b, which is a subset of output.rdb
  }

=head1 DESCRIPTION

Written for the 2-Dimensional "Big data" set reading.
As I got "Out of Memory" for loading a large data into piddle
with PDL::IO::FastRaw, and still cannot figure out how to use
PDL::IO::FlexRaw; This module provide a simple wrapper for
reading large binary data into smaller piddles.

kmx's PDL::IO::DBI is a really good solution, but I don't want
to use Relational databases to store my data.

=head1 HISTORY

=over 8

=item 0.01

Just found that the call $$pdl->upd_data() is essential.

=item 0.00_02

Bugfixes and improved performance

=item 0.00_011

Fixed Preq, in order to pass the cpan test

=item 0.00_01

Original version.

=back

=head1 SEE ALSO

You May also interested in
PDL::IO::FlexRaw
PDL::IO::DBI

=head1 AUTHOR

Kwok Lok Chung, Baggio, E<lt>rootkwok @ cpan.orgE<gt>

=head1 LICENSE

Copyright (C) 2015 by Kwok Lok Chung, Baggio.

This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.
To view a copy of this license, visit L<http://creativecommons.org/licenses/by-sa/4.0/>.

=cut
