# $Id: INSTALL,v 1.28 2003/12/13 15:51:19 jason Exp $

Bioperl Install Directions

o System requirements

 - Tested on all common forms of Unix, Win9X/NT/2000, Mac OS X 
   (see the PLATFORMS file for more details)

 - perl 5.005 or later *.

 - ANSI C or Gnu C compiler for optional extra XS extensions 

 - External modules: Bioperl uses functionality provided in other
   Perl modules.  Some of these are included in the standard perl
   install and some need to be obtained from the CPAN (www.cpan.org)
   site.  The list of external modules is included at the bottom of
   the INSTALL document and in the bptutorial.pl tutorial.  The
   Bioperl Bundle (Bundle::BioPerl) available through CPAN can make
   this process even easier.  Simply install the bundle in your CPAN
   shell and all necessary modules will be installed.

 - Bioperl on ActiveState for Windows - An ActiveState PPM distribution
   is available for bioperl v. 1.0.  Issue the following commands in
   your PPM shell to install Bioperl (ActiveState Perl v. 5.6*).

     PPM>repository add Bioperl http://bioperl.org/DIST/

   Get the number of the Bioperl repository:

     PPM>repository

   Set the Bioperl repository, find Bioperl, install Bioperl:

     PPM>repository set <Bioperl repository number>
     PPM>search *
     PPM>install <bioperl package number>
  
   As of this writing the Bundle::BioPerl bundle is not found in any
   ActiveState package but the following command should work in 
   Command Prompt:

     >perl -MCPAN -e "install Bundle::BioPerl"

   Do not use a Unix make to install Perl modules this way, use nmake. 
   See "DEPENDENCIES AND Bundle::BioPerl" below for more information
   on this useful bundle.

 - Additional information using Bioperl with MacOS can be 
   found at http://bioperl.org/Core/mac-bioperl.html

 - Additional information using Bioperl with OS X can be 
   found at http://www.tc.umn.edu/~cann0010/Bioperl_OSX_install.html

 - External programs
   Bioperl can interface with some external programs for executing
   analyses.  These include clustalw and t_coffee for Multiple
   Sequence Alignments (Bio::Tools::Run::Alignment::Clustalw and
   Bio::Tools::Run::TCoffee) and blastall,blastpgp, & bl2seq for BLAST
   analyses (Bio::Tools::Run::StandAloneBlast), and to all the
   programs in the EMBOSS suite (Bio::Factory::EMBOSS).

   Additionally Bioperl can submit and retrieve jobs from the NCBI
   Blast queue via the Bio::Tools::Run::RemoteBlast module.
  
   See the documentation for these modules for more information

 - Environment Variables

   [optional]

   Some modules which run external programs need certain environment
   variables set.  If you do not have a local copy of the specific
   executeable you do not need to set these variables.  Additionally
   the modules will attempt to locate the specific applications in
   your runtime PATH variable. You may also need to set an environment
   variable to tell BioPerl about your network configuration if your
   site uses a firewall.

   Setting environment variables on unix means adding a line like the
   following to your shell initialization:

   (for bash,sh)  export BLASTDIR=/data1/blast
   (for csh,tcsh) setenv BLASTDIR /data1/blast

   The environment variables include:
   Bio::Tools::Run::StandAloneBlast
   o BLASTDIR - which specifies where the NCBI blastall, blastpgp,
                bl2seq, etc.. are located.  A 'data' directory is
		assumed to be present in this dir as well where the
		blastable databases are located as well as 
		substitution matricies.

   o BLASTDATADIR or 
     BLASTDB - (either is optional) if one does not want to locate the data dir
	       within the same dir as where the BLASTDIR variable
	       points, a BLASTDATADIR or BLASTDB variable can be set to 
	       point to a dir where BLAST database indexes are located.
   Bio::Tools::Run::Alignment::Clustalw
   o CLUSTALDIR - points to the directory where the clustalw
                  executable is located.
   Bio::Tools::Run::Alignment::TCoffee
   o TCOFFEEDIR - points to the directory where the t_coffee
                  executable is located.
		 
   o HTTP_PROXY - If you access the internet via a proxy server then you 
                  can tell the Bioperl modules which require network access
                  about this by using the HTTP_PROXY environment variable.  
                  The value set includes the proxy address and the port 
                  used (eg http://wwwcache.example.com:8080).


 * Note that most modules will work with earlier versions of Perl. 
   The only ones that will not are Bio::SimpleAlign.pm and 
   the Bio::Index::* modules. If you don't need these modules
   and you want to install bioperl using an earlier version of Perl,
   edit the "require 5.004;" line in Makefile.PL as necessary.

 o Installing Bioperl

 THE EASY WAY

 The Bioperl modules are distributed as a tar file in standard perl
 CPAN distribution form. This means that installation is very
 simple. Once you have unpacked the tar distribution there is a
 directory called bioperl-xx/, which is where this file is.  Move into
 that directory (you may well be already in the right place!) and
 issue the following commands:

   perl Makefile.PL   # makes a system-specific makefile
   make               # makes the distribution
   make test          # runs the test code
   make install       # [may need root access for system install.
                      #  See below for how to get around this.]

 This should build, test and install the distribution cleanly on your
 system.  This installs the main perl part of bioperl, which is the
 majority of the bioperl modules. There is one module (Bio::Tools::pSW)
 which needs a compiled extension. This needs an extra installation
 step. The directions for installing this are given below - it is
 almost as easy as installing the standard distribution, so don't
 worry!

 You may have some errors from the pod2man part of the installation,
 such as 

   /usr/bin/pod2man: Unrecognized pod directive in paragraph 168 of Bio/Tools/Blast.pm: head3

 You don't need to worry about them: they do not affect the documentation
 processing.

 To install you need write permission in the perl5/site_perl/ source area. 
 Quite often this will require you (or someone else) becoming root, 
 so you will want to talk to your systems manager if you don't 
 have the necessary access.

 It is possible to install the package outside of the standard Perl5 
 location. See below for details.

 To install the Compiled extension for pSW you will need to read the
 next section of the manual.

 
 INSTALLING THE OPTIONAL COMPILED EXTENSIONS (bioperl-ext package)

 This Installation works out-of-the box for all platforms except *BSD
 and Solaris boxes. For notes on this, read on. For other platforms,
 skip ahead to the next section, BUILDING THE COMPILED EXTENSIONS.

 INSTALLING for *BSD and Solaris boxes.

 You should add the line -fPIC to the CFLAGS line in
 Compile/SW/libs/makefile.  This makes the compile generate position
 independent code, which is required for these architectures. In
 addition, on some Solaris boxes, the generated Makefile does not make
 the correct -fPIC/-fpic flags for the C compiler that is used. This
 requires manual editing of the generated Makefile to switch case. Try
 it out once, and if you get errors, try editing the -fpic line

 BUILDING THE COMPILED EXTENSIONS (OPTIONAL, ONLY FOR RUNNING LOCAL
 PROTEIN SMITH-WATERMAN ANALYSES FROM PERL)

 Move to the directory bioperl-ext.  This is available as a separate
 package released from ftp://bioperl.org/pub/DIST.  This is where the C
 code and XS extension for the bp_sw module is held and execute these
 commands: (possibly after making the change for *BSD and Solaris, as
 detailed above)

   perl Makefile.PL   # makes the system specific makefile 
                      # Solaris/BSD users might need to edit the Makefile here
   make               # builds all the libaries
   make test          # runs a short test
   make install       # installs the package correctly.

 This should install the compiled extension. The Bio::Tools::pSW module
 will work cleanly now.


 INSTALLING BIOPERL SCRIPTS

 Bioperl comes with a set of production-quality scripts that are kept
 in the scripts/ directory. You can install these scripts if you'd like,
 simply follow the instructions on 'make install'. The installation
 directory is specified by the INSTALLSCRIPT variable in the Makefile,
 the default location is /usr/bin. Installation will copy the scripts
 to the specified directory, change the 'PLS' suffix to 'pl', and
 prepend 'bp_' to all the script names if they aren't so named already.


 INSTALLING BIOPERL IN A PERSONAL OR PRIVATE MODULE AREA

 If you lack permission to install perl modules into the
 standard site_perl/ system area you can configure bioperl to
 install itself anywhere you choose. Ideally this would
 be a personal perl directory or standard place where you
 plan to put all your 'local' or personal perl modules. 

 Note: you _must_ have write permission to this area.

 Simply pass a parameter to perl as it builds your system
 specific makefile.

 Example:

   perl Makefile.PL  LIB=/home/users/dag/My_Local_Perl_Modules
   make
   make test
   make install

 This tells perl to install bioperl in the desired place, e.g.:
 
  /home/users/dag/My_Perl_Modules/Bio/Seq.pm

 Then in your Bioperl script you would write:

  use lib "/home/users/dag/My_Local_Perl_Modules";
  use Bio::Seq;

 The man pages will probably be installed in $LIB/man. For more
 information on these sorts of custom installs see the documentation
 for ExtUtils::MakeMaker.

 See below for how to use modules that are not installed in the
 standard Perl5 location.

 You can also use CPAN, as discussed below, to install accessory modules
 in your local directory. First enter the CPAN shell, 
 then set the arguments for the command "perl Makefile.PL":

 >perl -e shell -MCPAN
 cpan>o conf makepl_arg LIB=/home/users/dag/My_Local_Perl_Modules


 THE HARD WAY :)

 INSTALLING BIOPERL MODULES: LAST RESORT

 As a last resort, you can simply copy all files in Bio/
 to any directory in which you have write privileges. This is 
 generally NOT recommended since some modules may require
 special configuration (currently none do, but don't rely 
 on this.

 You will need to set "use lib '/path/to/my/bioperl/modules';" 
 in your perl scripts so that you can access these modules if
 they are not installed in the standard site_perl/ location.
 See below for an example.

 To get manpage documentation to work correctly you will have 
 to configure man so that it looks in the proper directory. 
 On most systems this will just involve adding an additional 
 directory to your $MANPATH environment variable.

 The installation of the Compile directory can be similarly
 redirected, but execute the make commands from the Compile/SW
 directory.

 If all else fails or are unable to access the perl distribution
 directories, ask your system administrator to place the files there 
 for you. You can always execute perl scripts in the same directory 
 as the location of the modules (Bio/ in the distribution) since perl 
 always checks the current working directory when looking for modules.


 USING MODULES NOT INSTALLED IN THE STANDARD PERL LOCATION

 You can explicitly tell perl where to look for modules by using the
 lib module which comes standard with perl.

 Example:

    #!/usr/bin/perl

    use lib "/home/users/dag/My_Local_Perl_Modules/";
    use Bio::Seq;

    <...insert whizzy perl code here...>

 Or, you can set the environmental variable PERL5LIB:

    # csh or tcsh
    setenv PERL5LIB /home/users/dag/My_Local_Perl_Modules/
    # bash
    export PERL5LIB=/home/users/dag/My_Local_Perl_Modules/


 THE TEST SYSTEM

 The Bioperl test system is located in the t/ directory and is
 automatically run whenever you execute the 'make test' command.
 Alternatively if you want to investigate the behavior of a specific
 test such as the SeqIO test you would type:
 % perl -I. -w t/SeqIO.t 
 The -I tells Perl to use the current directory as the include path -
 this makes sure you are testing the modules in this directory not
 ones installed elsewhere in your PERL5LIB path.
 The -w tells Perl to print all warnings.

 If you are trying to learn how to use a module, often the test suite
 is a good place to look.  All good extreme programers try and write a
 test BEFORE they write the module to insure that their module behaves
 the way they expect.  You'll notice some 'ok' and 'skip' commands in
 a test, this is part of the Perl test suite that signifies a passed
 test with an ok N where N is the test number.  Alternatively you can
 tell Perl to skip tests.  This is useful when, for example, your test
 detects that the network is not present and thus should skip, not
 fail, any tests that require a network connection.


 DEPENDENCIES AND Bundle::BioPerl

 The following packages are used by Bioperl.  Not all are required for
 Bioperl to operate properly, however, some functionality will be
 missing without them.  You can easily install all of these, except
 srsperl.pm, using the Bundle::BioPerl CPAN bundle. A command-line
 example:

  >perl -e shell -MCPAN
  cpan>install Bundle::BioPerl
  <...installation details...>
  cpan>quit

 or more simply:

  >perl -MCPAN -e "install Bundle::BioPerl"

 The DBD::mysql, DB_File and XML::Parser modules require other
 applications or databases: MySQL, Berkeley DB, and expat respectively.

 Module			 Where it is Used
 ------------------------------------------------------------------
 HTTP::Request::Common    GenBank+GenPept sequence retrieval, 
	 		  remote http Blast jobs
		 	  Bio::DB::*
			  Bio::Tools::Run::RemoteBlast

 LWP::UserAgent           GenBank+GenPept sequence retrieval, 
	 		  remote http Blast jobs
			  Bio::DB::*
			  Bio::Tools::Run::RemoteBlast

 AcePerl	          Access to ACeDB databases
			  Bio::DB::Ace
			  Available at http://stein.cshl.org
			 
 IO::String               IO handle to read or write to a string
			  Bio::SeqIO
			  Bio::Variation::*
			  Bio::DB::*
			  Bio::Index::Blast
			  Bio::Tools::*
			  Bio::Biblio::IO
			  Bio::Structure::IO

 XML::Parser              Parsing of XML documents
			  Bio::SeqIO::game
			  Bio::Variation::* 
			  Bio::SearchIO::blastxml
			  Bio::Biblio::IO::medlinexml
			  Requires expat from
			  http://sourceforge.net/projects/expat/
			 			 
 XML::Writer              Parsing + writing of XML documents
			  Bio::SeqIO::game
			  Bio::Variation::*

 XML::Parser::PerlSAX     Parsing of XML documents
			  Bio::SeqIO::game
			  Bio::Variation::*
			  Bio::SearchIO::blastxml
			  Bio::Biblio::IO::medlinexml

 XML::Twig        	  Parsing of XML documents
			  Bio::Variation::IO::xml

 File::Temp               Temporary File creation
			  Bio::DB::FileCache
			  Bio::DB::XEMBL

 SOAP::Lite               SOAP protocol, XEMBL Services
			  Bio::Biblio::*
			  Bio::DB::XEMBLService

 HTML::Parser             HTML parsing of GDB page
			  Bio::DB::GDB

 DBD::mysql               Mysql API for loading and querying of Mysql-based 
			  GFF feature and BioSQL databases
			  Bio::DB::GFF
			  bioperl-db external package
			  bioperl-pipeline external package
			  Mysql DB free from www.mysql.org

 GD			  GD graphical drawing library
			  Bio::Graphics
                          Requires GD library from www.boutell.com/gd

 srsperl		  Sequence Retrieval System (SRS) 
			  alternative way of retrieving 
			  sequences
			  Bio::LiveSeq::IO::SRS.pm
			  See README in Bio/LiveSeq/IO

 Storable		  Persistent object storage & retrieval 
			  Bio::DB::FileCache

 Text::Shellwords         Text parser
                          Bio::Graphics::FeatureFile

 XML::DOM		  XML parser
			  Bio::SeqIO::bsml.pm

 DB_File                  Perl access to Berkeley DB
                          Bio::DB::Flat
			  Bio::DB::Fasta,
			  Bio::SeqFeature::Collection,
			  Bio::Index::*
                          Requires Berkeley DB, from Linux RPM
			  or from www.sleepycat.com

 Graph::Directed          generic graph data and algorithms
                          Bio::Ontology::SimpleOntologyEngine

 Data::Stag::ITextWriter  Structured Tags datastructures
			  Bio::SeqIO::chadoitext

 Data::Stag::SxprWriter   Structured Tags datastructures
			  Bio::SeqIO::chadosxpr

 Data::Stag::XMLWriter    Structured Tags datastructures
			  Bio::SeqIO::chadoxml
 Text::Wrap               (Very optional)
                          Bio::SearchIO::Writer::TextResultWriter
 HTML::Entities           (only if you want to run webanalysis modules)
			  Bio::Tools::Analysis::DNA::*
			  Bio::Tools::Analysis::Protein::*
