Bio::ToolBox - Tools for querying and analysis of genomic data

DESCRIPTION

This is a collection of libraries and high-quality end-user 
scripts for bioinformatic analysis, particularly of genomic data 
obtained from microarray and next-generation sequencing.

The Bio::ToolBox libraries provide an abstraction layer over a variety
of different specialized BioPerl-style modules. For example, there is
a special emphasis on the collection data values for defined genomic
coordinate regions, regardless of whether the values come from a GFF
database, Bam file, BigWig file, etc. See the documentation with the
Bio::ToolBox::Data module for more information on working with tables
of information, for example BED files of genomic coordinates, and
using that information to collect data from databases.

The Bio::ToolBox package also includes a large number of high 
quality scripts for setting up databases of annotation, collecting 
annotated features, collecting genomic data relative to features, 
manipulating and analyzing data, and data format conversion. 
These scripts are installed in your local scripts bin directory. 



REQUIREMENTS

These are Perl modules and scripts. They require Perl and a unix-like 
command-line environment. They have been developed and tested on Mac 
OS X and linux; Microsoft Windows compatability is not tested nor 
guaranteed.

Most recent versions of Perl will work; older versions will require 
additional updating for compatibility.

Several different Perl modules are required for specific programs to work, 
most notably BioPerl, among a few others. Most of these dependencies can 
be taken care of by the installer.



INSTALLATION

Installation is simple with the standard Perl incantation.
    perl ./Build.PL
    ./Build installdeps     # if necessary
    ./Build
    ./Build test
    ./Build install

Installation may also be managed through a package manager, either the 
CPAN shell or a utility such as cpanminus.



USAGE OF PROVIDED SCRIPTS

* Configuration *
There is a configuration file that may be customized for your particular
installation. The default file is written to ~/.biotoolbox.cfg. It is a simple
INI-style file that is used to set up database connection profiles, feature
aliases, helper application locations, etc. The file may be edited by users. 
More documentation can be found in the Bio::ToolBox::db_helper::config 
documentation. This file is automatically written as needed; it is not 
installed by the Installer.

* Execution *
All biotoolbox scripts are designed to be run from the command line or
executed from another script. Some programs, for example
manipulate_datasets.pl, also provide an interactive interface to allow for
spontaneous work or when the exact index number or name of the dataset in
the file or database is not immediately known.

* Help *
All scripts require command line options for execution. Executing the
program without any options will present a synopsis of the options that are
available. Most programs also have a --help option, which will display
detailed information about the program and execution (usually by displaying
the internal POD). The options are given in the long format (--help, for
example), but may be shortened to single letters if the first letter is
unique (-h, for example).

* File Formats *
Many of the programs are designed to input and output a tabbed-delimited 
text format (unix line endings), where the rows represent genomic features, 
bins, etc. and the columns represent descriptive information and data. The 
first line in the table are the column headings. Metadata about each 
column are recorded in header lines at the beginning of the file and 
prefixed by a # symbol. The files may be compressed with gzip. More 
information may be found Bio::ToolBox::file_helper.



PROJECT WEBSITE

The BioToolBox project repository may be found at
http://code.google.com/p/biotoolbox/. 

Please contact the author for bugs. Feature requests are also accepted, 
within time constraints. Contact information is at the project website.




ONLINE DOCUMENTATION

Setting up a computer for BioToolBox
http://code.google.com/p/biotoolbox/wiki/BioToolBoxSetUp

Setting up Mac OS X for bioinformatics
http://code.google.com/p/biotoolbox/wiki/SetupForMacOSX

Up to data list of BioToolBox programs
http://code.google.com/p/biotoolbox/wiki/ProgramList

Working with annotation in databases
http://code.google.com/p/biotoolbox/wiki/WorkingWithDatabases

Working with data files and datasets
http://code.google.com/p/biotoolbox/wiki/WorkingWithDatasets

Description of supported data file formats
http://code.google.com/p/biotoolbox/wiki/DataFileFormats

Mapping SNPs 
http://code.google.com/p/biotoolbox/wiki/MappingSNPs

Description of the text data report format
http://code.google.com/p/biotoolbox/wiki/TimDataFormat


