
                            FileMetadata 0.1
-------------------------------------------------------------------------------

Metadata is data on data. Such data can include the author of a document or the
keywords describing the content in a document among others. FileMetadata is a
framework for extracting metadata from various formats and storing such
data. The framework is implemented in the Perl programming languange.

COMPONENTS
---------- 

There are two types of components in the framework.

Miners : Miners extract specific Metadata from a file. They specialize in
specific data or specific file formats. Miners implement the
FileMetadata::Miner interface and are packaged in the FileMetadata::Miner::*
namespace.

Stores : A store is used in an application to store Metadata from multiple
miners. Stores implement the FileMetadata::Store interface and are packaged
under the FileMetadata::Store::* namespace.  Installation You should download
the FileMetadata package from CPAN and install it. The next step would be to
read the documentation for the FileMetadata::Miner and FileMetadata::Store
interfaces using the perldoc command.

OVERVIEW
--------

A miner takes a filename and a reference to a hash as arguments to the mine()
function. Several miners can be called in series on a certain file. The same
hash is passed as a reference to each miner. The miner inserts key value pairs
into the hash. The keys are string prefixed by the perl package namespace of
the miner. The values are strings representing metadata. If a miner cannot
access the resource or cannot extract metadata from the given format it does
not insert anything into the hash. The mine() method will not 'die' from such
errors. The use of package names as prefixes in the hash keys avoids the
problem of one miner over-writing data from another miner. We will refer to
this hash as a meta hash from now.

An application should insert two keys into the meta hash at any point
of time before the hash is given to a store. The thwo keys are 'ID'
and 'TIMESTAMP'. 'ID' is an unique identifier for the set of
metadata. If a store receives two meta hashes with the same 'ID' it
will discard all data from the first meta hash and only retrieve data
from the latter. The absolute or relative paths to a file can be used
as identifiers. An application can choose to generate any string
identifier. The 'TIMESTAMP' identifies the time at which the Metadata
was generated and can be in any format convenient for the application.

The store() of a store accepts a meta hash for storage. The meta
should contain the 'ID' and 'TIMESTAMP' keys. If these keys are not
present the store can choose to ignore the data or 'die'. A store()
connects to a persistent storage such as a SQL database to store the
metadata for future use. Stores also implement other methods such as
list(), has(), remove() and clear() that make the task of the
application easier. A store typically aliases the keys in the meta
hash to application specific names and provides ways of selecting
which items in the meta hash must be stored and which can be ignored.

DEVELOPING APPLICATIONS
-----------------------

The First step to solving your problem is assesing your needs. Different
applications have different needs. The FileMetadata framework makes it simple
to develop applications in the framework from pre-existing components used in
conjunction with customized components.

First it is important to determine what file formats you are interested in and
what Metadata needs to be extracted from these file formats. If existing Miners
cannot accomplish the task, it will be necessary to develop new Miners. If you
choose to develop a new miner that might be useful to others, please consider
contributing it.

Once the data has been mined, it can be processed by the application directly
or it can be stored using a Store for future use. The Store interface makes it
possible to avoid processing files that have not changed since they were last
examined. Using the Stores might be a good idea when processing large amounts
of files. If you coose to develop a new miner that might be useful to others,
please consider contributing it.

RESOURCES
---------

http://icdweb.cc.purdue.edu/~midh/products/FileMetadata


LICENSE
-------

This software can be used under the terms of any Open Source Initiative
approved license. A list of these licenses are available at the OSI site -
http://www.opensource.org/licenses/