package Set::Files;
# Copyright (c) 2001-2008 Sullivan Beck. All rights reserved.
# This program is free software; you can redistribute it and/or modify it
# under the same terms as Perl itself.

# TODO
#    file locking (on a per-set basis)
#    create a set (set owner if root, otherwise use current user)

########################################################################

require 5.000;
use strict;
use warnings;
use Carp;
use IO::File;

use vars qw($VERSION);
$VERSION = "1.02";

########################################################################
########################################################################

=pod

=head1 NAME

Set::Files - routines to work with files, each definining a single set

=head1 SYNOPSIS

  use Set::Files;
  $Version = $Set::Files::VERSION;

  $obj     = new Set::Files(OPT => VAL, OPT => VAL, ...);

  @set     = $obj->list_sets( [TYPE] );

  @uid     = $obj->owner;
  $uid     = $obj->owner(SET);

  @set     = $obj->owned_by(UID [,TYPE]);

  @ele     = $obj->members(SET);

  $flag    = $obj->is_member(SET, ELE);

  @type    = $obj->list_types( [SET] );

  @dir     = $obj->dir;
  $dir     = $obj->dir(SET);

  %opts    = $obj->opts(SET);
  $val     = $obj->opts(SET,VAR);

  $obj->cache;

  $num     = $obj->add   (SET, FORCE, COMMIT, ELE1,ELE2,...);
  $num     = $obj->remove(SET, ELE1,ELE2,...);

  $obj->commit(SET1,SET2,...);

  $obj->delete(SET);
  $obj->delete(SET,1);

=head1 DESCRIPTION

This is a module for working with simple sets of elements where each set is
defined in a separate file (one file for each set to be defined).

The advantages of putting each set in a separate file are:

=over 4

=item Set managment can be delegated

If all sets are defined in a single file, management of all sets must be
done by a single user, or by using a suid program.  By putting each set in
a separate file, different files can be owned by different users so
management of different sets can be delegated.

=item Set files are a simple format

Because a file consists of a single set only, there is no need to have a
complex file format which has to be parsed to get information about the
set.  As a result, set files can easily be autogenerated or edited with
any simple text editor, and errors are less likely to be introduced into
the file.

=back

The disadvantages are:

=over 4

=item Permissions problems

Some applications may need to read all of the data, but since the
different set files may be owned by different people, permissions may get
set such that not all set files are readable.

Applications which actually gather all of the data will need to be run as
root in order to be reliable.  Alternately, some means of enforcing the
appropriate permissions needs to be in place.

=item No central data location

Usually, when you want to define sets, the data ultimately needs to be
stored in one central location (which might be a single file or database).

To get around this, a wrapper must be written using this module to copy
the data to the central location.

=item Simple elements only

Many types of sets have elements which have attributes (for example, a
ranking within the set or some other attribute).  When you start adding
attributes, you need a more complex file structure in order to store this
information, so that type of set is not addressed with this module.  The
only attribute that an element has is membership in the set.

=item Slow data access

Because the data is spread out over several files, each of which must be
parsed, and any error checking done, accessing the data can be
significantly slower than if the data were stored in a central location.

=back

Features of this module include:

=over 4

=item Data caching

This module provides routines for caching the information from all the set
files.  This can be used to avoid the permissions problems (allowing user
run applications access to all cached data) and decrease access time (no
parsing is left, and error checking can be done prior to caching the
information).

This still requires that a privileged user or suid script be used to
update the cache.

=item Multiple type of sets

Often, it is conveniant to define different types of sets using a single
set of files as there may be considerable overlap between the sets of
different types.

For example, it might be useful to create files containing sets of users
who belong to different committees in a department.  Also, there might be
sets of users who belong to various departmental mailing lists.  One
solution is to have two different directories, one with set files with
lists of users on the various committees; one with set files with lists of
users on each mailing list.  Since there might be overlap between these
groups, it might be nice to have the two sets of files overlap.  For
example, some committees may want to have a mailing list associated with
the group, others don't want a mailing list, and there may be mailing
lists not associated with a committee.

This allows you to have a single file for each set of users, but some sets
will have mailing lists, some will be committees, and some will be both.

=item Set ownership

Since the different files may be owned by different people, operations
based on set ownership can be done.

=cut

# =item File locking

=pod

=back

=head1 METHODS

The following methods are available:

=over 4

=item VERSION

  use Set::Files;
  $Version=$Set::Files::VERSION;

Check the module version.

=cut

########################################################################
# METHODS
########################################################################

my @Cache = qw(type owner dir opts ele);

# The Set::Files object:
#
#   { SET => { type  => { TYPE => 1, ... },
#              owner => USER,
#              dir   => DIR,
#              ele   => { ELE  => TRUE, ... },
#              opts  => { VAR  => VAL, ... },
#
#              incl  => { SET  => 1, ... },
#              excl  => { SET  => 1, ... },
#              omit  => { ELE  => 1, ... }
#            }
#   }
#
# The ELE => TRUE value is either 1 (if the element is explicitely
# included in the file) or 2 (if the element comes from an included
# file).

sub new {
  my($class,%opts) = @_;

  my $self = _Init(%opts);
  bless $self, $class;

  return $self;
}

=pod

=item new

  $obj = new Set::Files(OPT => VAL, OPT => VAL, ...);

This creates a new Set::Files object which reads the appropriate set files
(or a cache of the information in set files).  The initialization options
available are described below.

=cut

sub list_sets {
  my($self,$type) = @_;
  if ($type) {
    my(@ret);
    foreach my $set (keys %{ $$self{"set"} }) {
      push(@ret,$set)  if ($$self{"set"}{$set}{"type"}{$type});
    }
    return sort @ret;
  } else {
    return sort keys %{ $$self{"set"} };
  }
}

sub owner {
  my($self,$set) = @_;
  if ($set) {
    if (exists $$self{"set"}{$set}) {
      return $$self{"set"}{$set}{"owner"};
    } else {
      carp "ERROR: Invalid set: $set\n";
      return undef;
    }

  } else {
    my %tmp;
    foreach my $set (keys %{ $$self{"set"} }) {
      $tmp{ $$self{"set"}{$set}{"owner"} } = 1;
    }
    return sort keys %tmp;
  }
}

sub owned_by {
  my($self,$uid,$type) = @_;
  if (! defined $uid) {
    carp "ERROR: Must specify a UID for 'owned_by' info.\n";
    return undef;
  }

  my(@ret);
  foreach my $set (keys %{ $$self{"set"} }) {
    push(@ret,$set)  if ($$self{"set"}{$set}{"owner"} == $uid  &&
                         (! $type  ||
                          exists $$self{"set"}{$set}{"type"}{$type}));
  }
  return sort @ret;
}

sub members {
  my($self,$set) = @_;
  if (! $set) {
    carp "ERROR: Must specify a set for 'members' info.\n";
    return undef;
  }
  if (! exists $$self{"set"}{$set}) {
    carp "ERROR: Invalid set: $set\n";
    return undef;
  }
  return sort keys %{ $$self{"set"}{$set}{"ele"} };
}

sub is_member {
  my($self,$set,$ele) = @_;
  if (! $set) {
    carp "ERROR: Must specify a set for 'is_member' info.\n";
    return undef;
  }
  if (! exists $$self{"set"}{$set}) {
    carp "ERROR: Invalid set: $set\n";
    return undef;
  }
  if (! defined $ele) {
    carp "ERROR: Must specify an element for 'is_member' info.\n";
    return undef;
  }
  return 1  if (exists $$self{"set"}{$set}{"ele"}{$ele});
  return 0;
}

sub list_types {
  my($self,$set) = @_;
  if ($set) {
    if (exists $$self{"set"}{$set}) {
      return sort keys %{ $$self{"set"}{$set}{"type"} };
    } else {
      carp "ERROR: Invalid set: $set\n";
      return undef;
    }

  } else {
    my %tmp;
    foreach my $set (keys %{ $$self{"set"} }) {
      foreach my $type (keys %{ $$self{"set"}{$set}{"type"} }) {
        $tmp{$type} = 1;
      }
    }
    return sort keys %tmp;
  }
}

sub dir {
  my($self,$set) = @_;
  if ($set) {
    if (exists $$self{"set"}{$set}) {
      return $$self{"set"}{$set}{"dir"};
    } else {
      carp "ERROR: Invalid set: $set\n";
      return undef;
    }

  } else {
    my %tmp;
    foreach my $set (keys %{ $$self{"set"} }) {
      $tmp{ $$self{"set"}{$set}{"dir"} } = 1;
    }
    return sort keys %tmp;
  }
}

sub opts {
  my($self,$set,$opt) = @_;
  if (! $set) {
    carp "ERROR: Must specify a set for 'opts' info.\n";
    return undef;
  }
  if (! exists $$self{"set"}{$set}) {
    carp "ERROR: Invalid set: $set\n";
    return undef;
  }

  if ($opt) {
    if (exists $$self{"set"}{$set}{"opts"}{$opt}) {
      return $$self{"set"}{$set}{"opts"}{$opt};
    } else {
      return 0;
    }
  } else {
    return %{ $$self{"set"}{$set}{"opts"} };
  }
}

=pod

=item list_sets

  @set     = $obj->list_sets( [TYPE] );

Returns a list of all defined sets or the sets of the specified type.

=item owner

  @uid     = $obj->owner;
  $uid     = $obj->owner(SET);

Lists all UIDs who own a set, or the owner of the specified set.

=item owned_by

  @set     = $obj->owned_by(UID [,TYPE]);

Lists all sets owned by the specified UID (or those of a specific type).

=item members

  @ele     = $obj->members(SET);

Lists all elements in the specified set.

=item is_member

  $flag    = $obj->is_member(SET, ELE);

Returns 1 if ELE is a member of SET.

=item list_types

  @type    = $obj->list_types( [SET] );

A list of all types defined, or the types that the specified set
belong to.

=item dir

  @dir     = $obj->dir;
  $dir     = $obj->dir(SET);

All directories containing set files, or the directory containing the
file of the specified set.

=item opts

  %opts    = $obj->opts(SET);
  $val     = $obj->opts(SET,VAR);

Returns a hash of all options set for a set, or the value of a specific
option.  If the specific option is not set, 0 is returned.

=cut

sub delete {
  my($self,$set,$nobackup) = @_;
  if (! $set) {
    carp "ERROR: Set must be specified.\n";
    return;
  }
  if (! exists $$self{"set"}{$set}) {
    carp "ERROR: Invalid set: $set.\n";
    return;
  }

  my $dir = $$self{"set"}{$set}{"dir"};

  if (! -w $dir) {
    carp "ERROR: the delete method requires write access\n";
    return;
  }

  if (! -f "$dir/$set") {
    carp "ERROR: Set file nonexistant: $dir/$set\n";
    return;
  }

  if ($nobackup) {
    unlink "$dir/$set"  ||
      carp "ERROR: Unable to remove set file: $dir/$set\n";
  } else {
    rename "$dir/$set","$dir/.set_files.$set"  ||
      carp "ERROR: Unable to backup set file: $dir/$set\n";
  }
}

=pod

=item delete

  $obj->delete($set);
  $obj->delete($set,1);

This removes the specified set file.  By default, it renames the set
file to .set_files.$set (which are ignored when reading in set data).
If the optional second argument is passed in, no backup is made (i.e.
the set file is deleted completely).

This method is only available to those who have write access to the
directory containing the set file.

=cut

sub cache {
  my($self) = @_;
  if ($$self{"read"} ne "files") {
    carp "ERROR: unable to cache information: read from cache or file\n";
    return;
  }

  my($file) = $$self{"cache"} . "/.set_files.cache";
  my($out)  = new IO::File;

  if (! $out->open("$file.new",O_CREAT|O_WRONLY,0644)) {
    croak "ERROR: unable to create cache: $file.new: $!\n";
  }

  foreach my $set (sort keys %{ $$self{"set"} }) {
    print $out $set,"\n";
    foreach my $key (@Cache) {
      next  if (! exists $$self{"set"}{$set}{$key});

      if (ref $$self{"set"}{$set}{$key} eq "HASH") {
        print $out ".sf.hash\n";
        print $out $key,"\n";
        foreach my $k (sort keys %{ $$self{"set"}{$set}{$key} }) {
          print $out $k,"\n";
          print $out $$self{"set"}{$set}{$key}{$k},"\n";
        }
        print $out ".sf.end\n";
        next;
      }

      if (ref $$self{"set"}{$set}{$key} eq "ARRAY") {
        print $out ".sf.array\n";
        print $out $key,"\n";
        foreach my $k (@{ $$self{"set"}{$set}{$key} }) {
          print $out $k,"\n";
        }
        print $out ".sf.end\n";
        next;
      }

      print $out ".sf.scalar\n";
      print $out $key,"\n";
      print $out $$self{"set"}{$set}{$key},"\n";
    }
    print $out "\n";
  }
  $out->close;

  rename "$file.new",$file  ||
    croak "ERROR: unable to commit cache: $file: $!\n";
}

=pod

=item cache

  $obj->cache;

This dumps the current set information to a cache file.  This method is
only valid if the data was read in from files.  If it was read in from
the cache, this method will fail.

=cut

sub add {
  my($self,$set,$force,$commit,@ele) = @_;
  if ($$self{"read"} ne "files") {
    carp "ERROR: unable to add elements: read from cache\n";
    return;
  }

  if (! $set) {
    carp "ERROR: Must specify a set for adding elements.\n";
    return undef;
  }
  if (! exists $$self{"set"}{$set}) {
    carp "ERROR: Invalid set: $set\n";
    return undef;
  }
  if (! @ele) {
    carp "ERROR: No elements present for adding.\n";
    return undef;
  }

  my(@add);
  foreach my $ele (@ele) {
    if (! exists $$self{"set"}{$set}{"ele"}{$ele}  ||
        ($$self{"set"}{$set}{"ele"}{$ele} == 2  &&  $force)) {
      $$self{"set"}{$set}{"ele"}{$ele} = 1;
      delete $$self{"set"}{$set}{"omit0"}{$ele};
      push(@add,$ele);
    }
  }
  return 0  if (! @add);

  commit($self,$set)  if ($commit);
  return $#add+1;
}

sub remove {
  my($self,$set,$force,$commit,@ele) = @_;
  if ($$self{"read"} ne "files") {
    carp "ERROR: unable to remove elements: read from cache\n";
    return;
  }

  if (! $set) {
    carp "ERROR: Must specify a set for removing elements.\n";
    return undef;
  }
  if (! exists $$self{"set"}{$set}) {
    carp "ERROR: Invalid set: $set\n";
    return undef;
  }
  if (! @ele) {
    carp "ERROR: No elements present for removing.\n";
    return undef;
  }

  my(@rem);
  foreach my $ele (@ele) {
    if (exists $$self{"set"}{$set}{"ele"}{$ele}  ||
        ( (! exists $$self{"set"}{$set}{"omit0"}  ||
           ! exists $$self{"set"}{$set}{"omit0"}{$ele})  &&  $force )) {
      delete $$self{"set"}{$set}{"ele"}{$ele};
      $$self{"set"}{$set}{"omit0"}{$ele} = 1;
      push(@rem,$ele);
    }
  }
  return 0  if (! @rem);

  commit($self,$set)  if ($commit);
  return $#rem+1;
}

sub commit {
  my($self,@set) = @_;
  if (! @set) {
    carp "ERROR: Set must be specified.\n";
    return;
  }
  if ($$self{"read"} ne "file"  &&
      $$self{"read"} ne "files") {
    carp "ERROR: unable to commit changes: read from cache\n";
    return;
  }

  foreach my $set (@set) {
    if (! exists $$self{"set"}{$set}) {
      carp "ERROR: Invalid set: $set.\n";
      next;
    }

    # get dir and find out where to write new stuff

    my $dir = $$self{"set"}{$set}{"dir"};
    my $scr;
    my $wri;
    if (-w $dir) {
      $wri = 1;
      $scr = $dir;
    } else {
      $wri = 0;
      $scr = $$self{"scratch"};
    }

    # write the new file

    my $template  = $$self{"cache"} . "/.set_files.template";
    my $file      = "$scr/.set_files.$set.new";
    my $out       = new IO::File;
    my $in        = new IO::File;

    my @temp;
    if (-f $template) {
      if (! $in->open($template)) {
        carp "ERROR: Unable to open template: $file: $!\n";
      } else {
        @temp = <$in>;
        $in->close;
      }
    }

    if (! $out->open($file,O_CREAT|O_WRONLY,0644)) {
      carp "ERROR: Unable to write file: $file: $!\n";
      next;
    }
    foreach my $line (@temp) {
      print $out $line;
    }

    my $t = $$self{"tagchars"};

    foreach my $inc (sort keys %{ $$self{"set"}{$set}{"incl0"} }) {
      print $out $t,"INCLUDE $inc\n";
    }
    foreach my $exc (sort keys %{ $$self{"set"}{$set}{"excl0"} }) {
      print $out $t,"EXCLUDE $exc\n";
    }
    foreach my $omit (sort keys %{ $$self{"set"}{$set}{"omit0"} }) {
      print $out $t,"OMIT    $omit\n";
    }
    foreach my $type (sort keys %{ $$self{"set"}{$set}{"type0"} }) {
      print $out $t,"TYPE    $type\n";
    }
    foreach my $type (sort keys %{ $$self{"set"}{$set}{"notype0"} }) {
      print $out $t,"NOTYPE  $type\n";
    }
    foreach my $opt (sort keys %{ $$self{"set"}{$set}{"opts"} }) {
      my $val = $$self{"set"}{$set}{"opts"}{$opt};
      print $out $t,"OPTION  $opt = $val\n";
    }
    foreach my $ele (sort keys %{ $$self{"set"}{$set}{"ele"} }) {
      next  if ($$self{"set"}{$set}{"ele"}{$ele} == 2);
      print $out "$ele\n";
    }

    $out->close;

    # back up the old one

    if ($wri) {
      rename "$dir/$set","$dir/.set_files.$set"  ||  do {
        carp "ERROR: Unable to back up file: $dir/$set: $!\n";
        next;
      };
    } else {
      my @in;
      if (! $in->open("$dir/$set")) {
        carp "ERROR: Unable to read file: $dir/$set: $!\n";
        next;
      }
      @in = <$in>;
      $in->close;
      if (! $out->open("$scr/.set_files.$set",O_CREAT|O_WRONLY,0644)) {
        carp "ERROR: Unable to write file: $scr/.set_files.$set: $!\n";
        next;
      }
      foreach my $line (@in) {
        print $out $line;
      }
      $out->close;
    }

    # move the new one into place

    if ($wri) {
      rename "$dir/.set_files.$set.new","$dir/$set"  ||  do {
        carp "ERROR: Unable to commit file: $dir/$set: $!\n";
        next;
      };
    } else {
      my @in;
      if (! $in->open("$scr/.set_files.$set.new")) {
        carp "ERROR: Unable to read file: $scr/.set_files.$set.new: $!\n";
        next;
      }
      @in = <$in>;
      $in->close;
      if (! $out->open("$dir/$set",O_CREAT|O_WRONLY,0644)) {
        carp "ERROR: Unable to write file: $dir/$set: $!\n";
        next;
      }
      foreach my $line (@in) {
        print $out $line;
      }
      $out->close;
    }
  }
}

=pod

=item add, remove

  $num = $obj->add   (SET, FORCE, COMMIT, ELE1,ELE2,...);
  $num = $obj->remove(SET, FORCE, COMMIT, ELE1,ELE2,...);

These functions add/remove the specified elements to/from the set.

When adding elements to a set, it is first checked to see if the
element is already in the set, and if so, whether it is explicitely
excluded in the set file, or comes from some other set file via.
an B<INCLUDE> tag.

If the element is not in the set, it is added.  If the B<FORCE>
flag is true, the element will be added to the set file explicitly
if it is already in the set, but only via. an B<INCLUDE> tag.  In
either case, any B<OMIT> tag which removes this element will be
removed from the list.

When removing elements from a set, a similar set of tests are done.
If the element is in the set, it is removed from the file (if it
appears in the file) AND a B<OMIT> tag is included.  If the element
does NOT appear in the set, the file is unmodified unless the B<FORCE>
flag is true, in which case an B<OMIT> tag is added.

The B<COMMIT> flag is used to determine whether the file should be
written out over the existing file.  The file can only be written out
if data was read from the files.  If it was read in from the cache,
this will fail.

The return value is the number of changes made to the set.

=item commit

  $obj->commit(SET1,SET2,...);

Any changes that have been made with the B<add> and B<remove> methods
can be written out to the set file(s) with this method.  This method is
only valid if the data was read in from files. If it was read in from
the cache, this method will fail.

=cut

########################################################################

sub _Init {
  my(%opts)=@_;
  my(%self) = ();

  ###########################
  # Initialization

  # path

  my(@dir,@tmp);
  if (exists $opts{"path"}) {
    my $dir = $opts{"path"};
    if (ref($dir) eq "ARRAY") {
      @tmp = @$dir;
    } elsif (ref($dir)) {
      croak "ERROR: Invalid path value\n";
    } else {
      @tmp = split(":",$dir);
    }
  } else {
    @tmp = (".");
  }

  foreach my $dir (@tmp) {
    if (-d $dir) {
      push(@dir,$dir);
    } else {
      carp "WARNING: invalid directory: $dir\n";
    }
  }

  if (! @dir) {
    croak "ERROR: no valid path elements\n";
  }

  # cache

  my($cache,$cache_opt);
  if (exists $opts{"cache"}) {
    $cache       = $opts{"cache"};
    $cache_opt   = 1;
  } else {
    $cache       = $dir[0];
    $cache_opt   = 0;
  }
  $self{"cache"} = $cache;

  if (! -d $cache) {
    croak "ERROR: invalid cache directory: $cache\n";
  }

  # scratch

  my($scratch);
  if (exists $opts{"scratch"}) {
    $scratch       = $opts{"scratch"};
  } else {
    $scratch       = "/tmp";
  }
  $self{"scratch"} = $scratch;

  if (! -d $scratch  ||
      ! -w $scratch) {
    croak "ERROR: invalid scratch directory: $scratch\n";
  }

  # invalid_quiet

  my($invalid_quiet);
  if (exists $opts{"invalid_quiet"}) {
    $invalid_quiet = 1;
  } else {
    $invalid_quiet = 0;
  }

  # read

  my($read);
  if (exists $opts{"read"}) {
    $read = $opts{"read"};
    if ($read ne "cache"  &&
        $read ne "files"  &&
        $read ne "file") {
      croak "ERROR: Invalid read option: $read\n";
    }
  } else {
    if ($cache_opt) {
      $read="cache";
    } else {
      $read="files";
    }
  }
  $self{"read"} = $read;

  # set

  my($set);
  if (exists $opts{"set"}) {
    $set = $opts{"set"};
  } else {
    $set = "";
  }

  if ($read eq "file"  &&  ! $set) {
    croak "ERROR: Read file requires a set\n";
  }
  if ($set  &&  $read ne "file") {
    carp "WARNING: Set option ignored when not reading a single file\n";
    return;
  }

  # LOCK

  my($lock);
  if (exists $opts{"lock"}) {
    $lock = ($opts{"lock"} ? 1 : 0);
  } else {
    $lock = 0;
  }

  if ($lock) {
  }

  ###########################
  # Read Cache

  if ($read eq "cache") {
    my $file = "$cache/.set_files.cache";
    if (-f $file) {
      my $in = new IO::File;
      $in->open($file)  ||
        croak "ERROR: unable to read cache: $file: $!\n";
      my @in = <$in>;
      $in->close;
      chomp(@in);
      while (@in) {
        my $set = shift(@in);
        while ($in[0]) {
          my $tmp = shift(@in);
          my $key = shift(@in);
          if      ($tmp eq ".sf.hash") {
            while ($in[0] ne ".sf.end") {
              my $k = shift(@in);
              $self{"set"}{$set}{$key}{$k} = shift(@in);
            }
            shift(@in);

          } elsif ($tmp eq ".sf.array") {
            my @tmp;
            while ($in[0] ne ".sf.end") {
              push(@tmp,shift(@in));
            }
            $self{"set"}{$set}{$key} = [ @tmp ];
            shift(@in);

          } elsif ($tmp eq ".sf.scalar") {
            $self{"set"}{$set}{$key} = shift(@in);
          }
        }
        shift(@in);
      }

    } else {
      $read = "files";
    }
  }

  ###########################
  # Read Files

  if ($read eq "files"  ||
      $read eq "file") {

    # valid_file

    my($valid_file,$valid_file_re,$valid_file_nre);
    if (exists $opts{"valid_file"}) {
      my $tmp = $opts{"valid_file"};
      if (ref($tmp) eq "CODE") {
        $valid_file     = $tmp;
        $valid_file_re  = "";
        $valid_file_nre = "";
      } elsif (ref($tmp)) {
        croak "ERROR: Invalid valid_file value\n";
      } elsif ($tmp =~ s,^!,,) {
        $valid_file     = "";
        $valid_file_re  = "";
        $valid_file_nre = $tmp;
      } else {
        $valid_file     = "";
        $valid_file_re  = $tmp;
        $valid_file_nre = "";
      }
    } else {
      $valid_file     = "";
      $valid_file_re  = "";
      $valid_file_nre = "";
    }

    my %dir;
    foreach my $dir (@dir) {
      if (! opendir(DIR,$dir)) {
        carp "ERROR: Can't read directory: $dir: $!\n";
        next;
      }
      my(@f) = readdir(DIR);
      closedir(DIR);
      foreach my $f (@f) {
        next  if ($f eq "."  ||
                  $f eq ".." ||
                  $f =~ /^.set_files/ ||
                  ! -f "$dir/$f");
        if (($valid_file_nre  &&  $f =~ /$valid_file_nre/)  ||
            ($valid_file_re   &&  $f !~ /$valid_file_re/)  ||
            ($valid_file      &&  ! &$valid_file($dir,$f))) {
          warn "WARNING: File fails validity test: $f\n"
            if (! $invalid_quiet);
          next;
        }
        if (exists $dir{$f}) {
          carp "WARNING: File redefined: $f\n";
        } else {
          $dir{$f} = $dir;
        }
      }
    }

    # types

    my(@types);
    if (exists $opts{"types"}) {
      my $type = $opts{"types"};
      if (ref($type) eq "ARRAY") {
        @types = @$type;
      } elsif (ref($type)) {
        croak "ERROR: Invalid types value\n";
      } else {
        @types = ($type);
      }
    } else {
      @types = ("_");
    }

    # default_types

    my(@def_types);
    if (exists $opts{"default_types"}) {
      my $type = $opts{"default_types"};
      if (ref($type) eq "ARRAY") {
        @def_types = @$type;
      } elsif (ref($type)) {
        croak "ERROR: Invalid default_types value\n";
      } elsif ($type eq "all") {
        @def_types = (@types);
      } elsif ($type eq "none") {
        @def_types = ();
      } else {
        @def_types = ($type);
      }
    } else {
      @def_types = @types;
    }

    my %tmp = map { $_,1 } @types;
    my @tmp;
    foreach my $type (@def_types) {
      if (! exists $tmp{$type}) {
        carp "WARNING: Invalid default_types value: $type\n";
      } else {
        push(@tmp,$type);
      }
    }
    @def_types = @tmp;

    # comment

    my($comment);
    if (exists $opts{"comment"}) {
      $comment = $opts{"comment"};
    } else {
      $comment = "#.*";
    }
    $self{"comment"} = $comment;

    # tagchars

    my($tagchars);
    if (exists $opts{"tagchars"}) {
      $tagchars = $opts{"tagchars"};
    } else {
      $tagchars = '@';
    }
    $self{"tagchars"} = $tagchars;

    # valid_ele

    my($valid_ele,$valid_ele_re,$valid_ele_nre);
    if (exists $opts{"valid_ele"}) {
      my $tmp = $opts{"valid_ele"};
      if (ref($tmp) eq "CODE") {
        $valid_ele     = $tmp;
        $valid_ele_re  = "";
        $valid_ele_nre = "";
      } elsif (ref($tmp)) {
        croak "ERROR: Invalid valid_ele value\n";
      } elsif ($tmp =~ s,^!,,) {
        $valid_ele     = "";
        $valid_ele_re  = "";
        $valid_ele_nre = $tmp;
      } else {
        $valid_ele     = "";
        $valid_ele_re  = $tmp;
        $valid_ele_nre = "";
      }
    } else {
      $valid_ele     = "";
      $valid_ele_re  = "";
      $valid_ele_nre = "";
    }

    # Read File

    if ($read eq "file") {
      my(@set) = ($set);;
      while (@set) {
        $set = shift(@set);
        next  if (exists $self{"set"}{$set});

        if (! exists $dir{$set}) {
          croak "ERROR: invalid set to read: $set\n";
        }

        $self{"set"}{$set} = _ReadSet($set,$dir{$set},\@types,\@def_types,
                                     $comment,$tagchars,
                                     $valid_ele,$valid_ele_re,$valid_ele_nre,
                                     $invalid_quiet);
        push (@set,keys %{ $self{"set"}{$set}{"incl"} })
          if (exists $self{"set"}{$set}{"incl"});
        push (@set,keys %{ $self{"set"}{$set}{"excl"} })
          if (exists $self{"set"}{$set}{"excl"});
      }
    }

    # Read Files

    if ($read eq "files") {
      foreach my $set (keys %dir) {
        $self{"set"}{$set} = _ReadSet($set,$dir{$set},\@types,\@def_types,
                                     $comment,$tagchars,
                                     $valid_ele,$valid_ele_re,$valid_ele_nre,
                                     $invalid_quiet);
      }
    }

    # Includes and Excludes

    foreach my $set (keys %{ $self{"set"} }) {
      if (exists $self{"set"}{$set}{"incl"}) {
        foreach my $inc (keys %{ $self{"set"}{$set}{"incl"} }) {
          if (! exists $self{"set"}{$inc}) {
            carp "WARNING: Invalid include [ $inc ] in set: $set\n";
            delete $self{"set"}{$set}{"incl"}{$inc};
            delete $self{"set"}{$set}{"incl"}
              if (! keys %{ $self{"set"}{$set}{"incl"} });
          }
        }
      }

      if (exists $self{"set"}{$set}{"excl"}) {
        foreach my $exc (keys %{ $self{"set"}{$set}{"excl"} }) {
          if (! exists $self{"set"}{$exc}) {
            carp "WARNING: Invalid exclude [ $exc ] in set: $set\n";
            delete $self{"set"}{$set}{"excl"}{$exc};
            delete $self{"set"}{$set}{"excl"}
              if (! keys %{ $self{"set"}{$set}{"excl"} });
          }
        }
      }
    }

    while (1) {
      my $flag1 = _ExpandInclude($self{"set"});
      my $flag2 = _ExpandExclude($self{"set"});
      last  if (! $flag1  &&  ! $flag2);
    }

    foreach my $set (keys %{ $self{"set"} }) {
      if (exists $self{"set"}{$set}{"excl"}  ||
          exists $self{"set"}{$set}{"incl"}) {
        carp "ERROR: Unresolved (circular) dependancy: $set\n";
      } elsif (exists $self{"set"}{$set}{"omit"}) {
        foreach my $ele (keys %{ $self{"set"}{$set}{"omit"} }) {
          delete $self{"set"}{$set}{"ele"}{$ele};
        }
        delete $self{"set"}{$set}{"omit"};
      }
    }

    if (! keys %{ $self{"set"} }) {
      croak "ERROR: No set data read.\n";
    }
  }

  return \%self;
}

sub _ReadSet {
  my($set,$dir,$types,$def_types,$comment,$tagchars,
     $valid_ele,$valid_ele_re,$valid_ele_nre,$invalid_quiet) = @_;
  my %set;

  $set{"dir"} = $dir;

  my $in = new IO::File;
  if (! $in->open("$dir/$set")) {
    croak "ERROR: Unable to open file: $dir/$set: $!\n";
  }
  my $uid = ( stat("$dir/$set") )[4];
  $set{"owner"} = $uid;
  _ReadSetFile($set,$in,\%set,$types,$def_types,$comment,
              $tagchars,$valid_ele,$valid_ele_re,$valid_ele_nre,
              $invalid_quiet);
  $in->close;
  return \%set;
}

sub _ReadSetFile {
  my($set,$in,$self,$types,$def_types,$comment,$tagchars,
     $valid_ele,$valid_ele_re,$valid_ele_nre,$invalid_quiet)=@_;
  my %types = map { $_,1 } @$types;
  my %def_types = map { $_,1 } @$def_types;
  $$self{"type"} = { %def_types };
  my(@in) = <$in>;
  chomp(@in);
  foreach my $line (@in) {
    $line =~ s,$comment,,;
    $line =~ s,^\s+,,;
    $line =~ s,\s+$,,;
    next  if (! $line);

    if ($line =~ s,^$tagchars,,) {
      $line =~ s,^\s+,,;
      if ($line =~ /^include\s+(.+)/i) {
        my $tmp = $1;
        my @tmp = split(/,/,$tmp);
        foreach my $tmp (@tmp) {
          $$self{"incl"}{$tmp} = 1;
          $$self{"incl0"}{$tmp} = 1;
        }

      } elsif ($line =~ /^exclude\s+(.+)/i) {
        my $tmp = $1;
        my @tmp = split(/,/,$tmp);
        foreach my $tmp (@tmp) {
          $$self{"excl"}{$tmp} = 1;
          $$self{"excl0"}{$tmp} = 1;
        }

      } elsif ($line =~ /^type\s+(.+)/i) {
        my $tmp = $1;
        my @tmp = split(/,/,$tmp);
        foreach my $tmp (@tmp) {
          if (exists $types{$tmp}) {
            $$self{"type"}{$tmp} = 1;
            $$self{"type0"}{$tmp} = 1;
          } else {
            carp "ERROR: Invalid set type: $set [ $tmp ]\n";
          }
        }

      } elsif ($line =~ /^notype\s+(.+)/i) {
        my $tmp = $1;
        my @tmp = split(/,/,$tmp);
        foreach my $tmp (@tmp) {
          if (exists $types{$tmp}) {
            delete $$self{"type"}{$tmp};
            $$self{"notype0"}{$tmp} = 1;
          } else {
            carp "ERROR: Invalid set type: $set [ $tmp ]\n";
          }
        }

      } elsif ($line =~ /^omit\s+(.+)/i) {
        $$self{"omit"}{$1} = 1;
        $$self{"omit0"}{$1} = 1;

      } elsif ($line =~ /^option\s+(.+?)\s*=\s*(.*)/i) {
        my($var,$val)=($1,$2);
        $val=0  if (! $val);
        $$self{"opts"}{$var} = $val;

      } elsif ($line =~ /^option\s+(.+)/i) {
        $$self{"opts"}{$1} = 1;

      } else {
        carp "ERROR: Invalid tag line: $set: $line\n";
      }

    } else {
      if (($valid_ele_nre  &&  $line =~ /$valid_ele_nre/)  ||
          ($valid_ele_re   &&  $line !~ /$valid_ele_re/)  ||
          ($valid_ele      &&  ! &$valid_ele($set,$line))) {
        warn "WARNING: Element fails validity test: $line\n"
            if (! $invalid_quiet);
        next;
      }
      $$self{"ele"}{$line} = 1;
    }
  }
}

sub _ExpandInclude {
  my($self)=@_;
  my $prog = 0;             # overall progress

  my %inc;
  my %exc;
  foreach my $set (keys %$self) {
    $inc{$set} = 1  if (exists $$self{$set}{"incl"});
    $exc{$set} = 1  if (exists $$self{$set}{"excl"});
  }

  while (1) {
    last  if (! keys %inc);
    my $progress = 0;       # progress this iteration

    foreach my $set (keys %inc) {
      foreach my $inc (keys %{ $$self{$set}{"incl"} }) {
        next  if (exists $inc{$inc}  ||
                  exists $exc{$inc});
        $prog = $progress = 1;

        foreach my $ele (keys %{ $$self{$inc}{"ele"} }) {
          $$self{$set}{"ele"}{$ele} = 2
            if (! exists $$self{$set}{"ele"}{$ele});
        }

        delete $inc{$set};
        delete $$self{$set}{"incl"}{$inc};
        delete $$self{$set}{"incl"}  if (! keys %{ $$self{$set}{"incl"} });
      }
    }
    next  if ($progress);
    last;
  }
  return $prog;
}

sub _ExpandExclude {
  my($self)=@_;
  my $prog = 0;

  my %inc;
  my %exc;
  foreach my $set (keys %$self) {
    $inc{$set} = 1  if (exists $$self{$set}{"incl"});
    $exc{$set} = 1  if (exists $$self{$set}{"excl"});
  }

  while (1) {
    last  if (! keys %exc);
    my $progress = 0;       # progress this iteration

    foreach my $set (keys %exc) {
      next  if (exists $inc{$set});  # only exclude after all includes
      foreach my $exc (keys %{ $$self{$set}{"excl"} }) {
        next  if (exists $inc{$exc}  ||
                  exists $exc{$exc});
        $prog = $progress = 1;

        foreach my $ele (keys %{ $$self{$exc}{"ele"} }) {
          # We don't want to exclude elements that are explicitly included
          # in the set file.
          delete $$self{$set}{"ele"}{$ele}
            if (exists $$self{$set}{"ele"}{$ele}  &&
                $$self{$set}{"ele"}{$ele} == 2);
        }

        delete $exc{$set};
        delete $$self{$set}{"excl"}{$exc};
        delete $$self{$set}{"excl"}  if (! keys %{ $$self{$set}{"excl"} });
      }
    }
    next  if ($progress);
    last;
  }
  return $prog;
}

########################################################################

=pod

=back

=head1 INIT OPTIONS

The following options can be passed in to the B<new> method:

=over 4

=item path

  path => DIR1:DIR2:...
  path => [ DIR1, DIR2, ... ]

The set files may be stored in one or more different directories.  By
default, set files are assumed to be in the current directory, but using
this option, the directory (or directories) can be explicitely set.

One thing to note.  If multiple directories are used, and a file of the
same name exists in more than one of the directories, the first one found
(in the order that the directories are included in the list) is used.  A
warning will be issued for files of the same name in other directories,
but they will be ignored.

Warnings will be issued for unreadable directories, or unreadable files
within a directory.

=item valid_file

  valid_file => REGEXP
  valid_file => !REGEXP
  valid_file => \&FUNCTION

By default, all files in the directories are used.  With this option,
filenames are tested and only those that pass will be used.  Others will be
silently ignored.

REGEXP is a regular expression.  Only filenames which match the REGEXP will
pass (or if !REGEXP is used, only filenames which do NOT match REGEXP will
pass).

If a reference to a function is passed in, the function &FUNCTION(dir,file)
will be evaluated for each file.  If it returns 0, the file will be
silently ignored.  Otherwise it will be used.

=item invalid_quiet

  invalid_quiet = 1

By default, when a file is ignored due to failing a B<valid_file> test, or
when an element is ignored due to failing a B<valid_ele> test, a warning
is issued.  With this option, no warning is issued.

=item cache

  cache => DIR

Data from the set files may be cached in order to speed up data access.
If this option is used, you must specify the directory where the data
will be cached.  The directory may be the same as one of the directories
containing the set files.

The cache directory defaults to the first directory given in the B<path>
option (or the current directory if no B<path> option is given).

=item read

  read => "cache"
  read => "files"
  read => "file"

When an application wants to use data from the set files, they can either
read the data from set files or the cache.

If the B<cache> option was used, the default is to read from the cache
if it exists, read from the files otherwise.  If no B<cache> option
was used, the default is to read from the files.  When data is read in
from the cache, the B<commit> and B<cache> methods are disabled.

If the B<file> option is used, it reads a single set from a single
file along with all dependancy sets (i.e. sets that are included or
excluded via. the appropriate tags).  This allows someone to make changes
to a single set file that they own even if permissions are set so that
they cannot read other set files.  The B<commit> method is available, but
the B<cache> method is disabled.  The B<file> option requires that the
B<set> option also be present.

With the B<files> option, all set files are read.  Both the B<commit>
and B<cache> methods are enabled.

=item set

  set => SET

This defines which set to read when the B<read => file> option is used.
This option is required when B<read => file> and ignored for any other
value for B<read>.

=cut

#  =item lock
#
#    lock => 1
#    lock => 0
#
#  Some applications may need to be the only application accessing the data,
#  while others can safely access it simultaneously with other applications.
#  Applications which will modify the data must access the data alone.
#  Applications which access the data read-only can share access.
#
#  The value of the B<lock> option says whether access can be shared, or
#  whether other applications must be locked out.  Choose 1 to lock the data
#  so other applications cannot access it.
#
#  The default is to allow shared access.
#
#  The lock file is stored in the same directory as the cache.

=pod

=item types

  types => TYPE
  types => [ TYPE1, TYPE2, ... ]

Sets can be of one or more types (or they can belong to no type and be
used solely in building other sets using the B<INCLUDE> or B<EXCLUDE>
tags described in the B<FILE FORMAT> section below).

This option can be used to specify the names of the different types
of sets defined by these files.

If this option is not given, then there is only one type and by default,
all sets belong to it.

=item default_types

  default_types => [ TYPEa, TYPEb, ... ]
  default_types => "all"
  default_types => "none"
  default_typew => TYPE

Some types of sets may be more common than others, and you may or may not
want to have to explicitely define which types a set belong to.

If a list of types are passed in, every type must be defined in the
B<types> option (warnings will be issued if they weren't).  If a value
of "all" is passed in, sets belong to all types by default.  If a value
of "none" is passed in, sets don't belong to any type by default.

By default, sets belong to all types available.

=item comment

  comment => REGEXP

This defines a regular expression used to recognize (and strip out) comments
from a set file.  The default expression is "#.*" which means that all
characters from a pound sign to the end of the line are removed.

If REGEXP is passed in as an empty string, there are no comments.  All
lines are either empty or contain an element.

=item tagchars

  tagchars => STRING

This defines a character (or a string) which marks a line of the set
file as containing a tag.  The default value is "@".

=item valid_ele

  valid_ele => REGEXP
  valid_ele => !REGEXP
  valid_ele => \&FUNCTION

By default, every non-blank line (after comments have been stripped out)
is treated as an element.  If this option is used, elements are tested,
and only those that pass the test are treated as valid.  Others are
invalid and produce a warning.

If a reference to a function is passed in, the function &FUNCTION(set,ele)
will be evaluated for each element.  If it returns 0, the element will be
silently ignored.  Otherwise it will be included in the set.

=item scratch

  scratch => DIR

When automatically updating a set file, the directory where the files
live may or may not be writable by a user who owns a set file.

If the directory is writable by the user, there is no problem.  In this
case, when a new set file is written, the old one is backed up and the
new one written in it's place.

If the directory is NOT writable by the user, the old copy is backed
up to the B<scratch> directory.  This directory must be writable by the
user.  It defaults to B</tmp>.

=back

=head1 FILE FORMAT

A set file has a very simple format.  It consists of blank lines, tags,
and elements.  Comments may be included as whole lines or part of one
of the above lines.

Each line is checked for comments and they are removed before any other
processing is done.  A comment is anything that matches a regular
expression which can be set using the B<comment> Init option.  The default
regular expression is "#.*" which means that comments start with a pound
sign anywhere on the line and go to the end of the line.

Tags are lines which begin with begin with a special string (which can be
set with the B<tagchars> Init option.  The default string is "@".  Tag lines
are of one of the formats:

  @TAG
  @TAG VAL1,VAL2,...

All other lines are elements.  Elements are any string (one per line).

Leading/trailing spaces are ignored in all cases.

The set name is the name of the set file.

The following TAGs are known:

=over 4

=item INCLUDE SET1,SET2,...

This includes all members of one or more other sets in the current set.

=item EXCLUDE SET1,SET2,...

This excludes all members of one or more other sets from the current set.
This overrides any members included from other sets, but does NOT exclude
members explicitely included in the set file.

=item OMIT ELE

This exludes a specific element from the current set.  This overrides
any elements included via. an B<INCLUDE> tag, or any elements explicitly
included in the set file.

Each element must be specified separately since there is no guarantee that
elements may not contain commas.

=item TYPE TYPE1,TYPE2,...

The default types that this set belongs to are determined by the B<types>
and B<default_types> Init options.

This tag explicitely puts this set if the specified types, even if it
is not in those types of default.

=item NOTYPE TYPE1,TYPE2,....

Similar to the B<TYPE> tag, but this tag explicitely removes the set
from the specified types, even if it is in them by default.

=item OPTION VARIABLE [= VALUE]

Although there is no support for element specific attributes, there IS
support for attributes which apply to the entire set (and which can be made
available to applications using these sets).

Each set may have a hash associated with with key/value pairs (if no value
is include, it defaults to 1).  These attributes are available using the
B<info> method.

=back

All tag lines can be repeated any number of times, so:

  @INCLUDE foo,bar

is equivalent to

  @INCLUDE foo
  @INCLUDE bar

All tags are case insensitive.

When determining the members of a set which includes and excludes other
sets, or omits specific elements from the set, all inclusions are evaluted
first, followed by all exclusions (i.e. all exclusions override all
inclusions).  If there is a cyclic dependancy (i.e. A depends on B depends
on A where a dependancy can either be an B<INCLUDE> or B<EXCLUDE>), an
error is reported and the cyclic dependancy is ignored.

A few examples illustrate the use of B<INCLUDE>, B<EXCLUDE>, and B<OMIT>
tags.  In the examples, the set file B<A> contains the elements: B<E1, E2,
E3>.  The set file B<B> contains the elements: B<E3, E4, E5>.  The set file
contains the following lines:

  @INCLUDE A
  @EXCLUDE B
  E5
  E6

defines a set contains the elements: B<E1, E2, E5, E6>.  The first line
includes B<E1, E2, E3>.  The second line excludes B<E3>.  It does NOT
exclude B<E5> since the B<EXCLUDE> tag does not override elements
explicitly included in the set file.  Finally, the B<E5> and B<E6> elements
are added.

The set file containing the following lines:

  @INCLUDE A
  @EXCLUDE B
  @OMIT    E2
  @OMIT    E6
  E5
  E6

defines a set contains the elements: B<E1, E5>.  This is similar to the
above example, except that the B<OMIT> tags override elements included
via. the B<INCLUDE> tag AND elements explicitly included in the set file.

=head1 FILES

Several files are used by the B<Set::Files> module.  They all live in the
directory set by the B<cache> Init Option except for set specific files
which live in the same directory as the set file.  Files are:

=over 4

=item .set_files.SET

A backup of the given set.  When a set file is updated, the original file
is stored in this file.  The file is stored either in the same directory
as the set file (if it is writable) or in the directory specified by the
B<scratch> Init Option.

=item .set_files.SET.new

A temporary file where a new set file (or the update to an old one) is
written.  Once completed, this file is moved into place as the new set
file.  This file lives in the same directory as the set file or in the
B<scratch> directory.

=item .set_files.cache

The file containing the cache.  This is created using the B<cache> method.

=item .set_files.template

When creating a new set file (or updating an existing one), this file is
used (if it exists) as a starting point and then all the data is appended
to it.  This is a good place to store comments describin how to edit
the set files, etc., that set file maintainers can read for help.

=back

=head1 KNOWN PROBLEMS

None at this point.

=head1 LICENSE

This script is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.

=head1 AUTHOR

Sullivan Beck (sbeck@cpan.org)

=cut

1;
# Local Variables:
# mode: cperl
# indent-tabs-mode: nil
# cperl-indent-level: 3
# cperl-continued-statement-offset: 2
# cperl-continued-brace-offset: 0
# cperl-brace-offset: 0
# cperl-brace-imaginary-offset: 0
# cperl-label-offset: -2
# End:
