SmotifCS
Version 0.02

SmotifCS Hybrid Modeling Method


PRE-REQUISITES: 

The hybrid modeling algorithm requires a BMRB formatted chemical shift file as input. 
Additionally, if the structure of the protein is known from any alternate resource,
then a PDB-formatted structure file is required. This pdb-file can be present in a
centralized local directory or a user-designated separate directory. 

	Third-party Software:

	1. MySQL  
	2. Phylip 
	3. Modeller 
	4. NMRPipe/TALOS 

	Other requirements:

	1. MySQL Smotif database (http://fiserlab.org/SmotifCS/vilas_loop_pred.sql.gz)
	2. Smotif chemical shift library and related files (http://fiserlab.org/SmotifCS/chemical_shift.tar.gz)
	3. Local PDB directory (central or user-designated) - updated (http://www.rcsb.org). 

	The path to all the pre-requisites should be provided in smotifcs.ini configuration file. 


DOWNLOAD AND INSTALL Third-party Software:
    1. MySQL
       Can be downloaded from (http://dev.mysql.com/downloads/mysql/) 
 
	2. Phylip   (version 3.69) 
       PHYLIP is freely available from:
       http://evolution.genetics.washington.edu/phylip.html
       
       PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies 
       (evolutionary trees). It is available free over the Internet, and written to work on as many 
       different kinds of computer systems as possible. The source code is distributed (in C), 
       and executables are also distributed. 
       
       INSTALLATION
       http://evolution.genetics.washington.edu/phylip/getme.html


       if you’re using a Windows machine, 
       installation is easy. Download the three zip-files (phylip.exe,phylipwx.exe,phylipwy.exe ), 
       and extract them to a preferred folder. The subfolder exe contains all the programs. 
       Manual can be found from the subfolderdoc.

       For Macintosh OS X
       you may download the packaged disk image (Phylip3.66.dmg). It is compressed, so you need to 
       expand it, and copy the resulting folder to a desired location. Alternatively, you may compile 
       the programs from their sources as outlined in the UNIX installation below. There are source 
       codes and ready made compilations available for older Macintosh systems, Mac OS 8 or 9, also.

       Installation for UNIX systems
       is also quite straight-forward. These instruction apply for RedHat-based Linux systems. 
       Download the source code and documentation package (phylip-3.66.tar.gz) into a suitable folder. 
       Unzip the package with gzip utility (gzip –d phylip-3.66.tar.gz) and expand the tar ball 
       (tar xvf phylip-3.66.tar). Move to the newly formed folder containing the source codes (cd phylip3.6/src). 
       The folder contains a file called Makefile. Installation of the PHYLIP programs is done simply by typing
       make install


       INSTALL PHYLIP ON LINUX and UNIX
       http://evolution.gs.washington.edu/phylip/download/phylip-3.696.tar.gz

       You can easily install PHYLIP and compile it yourself on a Linux or Unix system, 
       provided that you have a C compiler on your system. 

       tar -zxvf phylip-3.696.tar.gz 

       This uncompresses the archive and a phylip3.696 folder is created that contains within 
       it three folders, doc, exe, and src.

       To make executables, use your C compiler. It is probably as simple as going into the src directory, 
       copying Makefile.unx and calling the copy Makefile, and then typing the command

       $ cp Makefile.unx Makefile
       $ make install

       With luck this will work. After the compilation the executables and their font files will 
       be in folder exe. 

       INSTALLATION SUMMARY

       $ wget  http://evolution.gs.washington.edu/phylip/download/phylip-3.696.tar.gz
       $ tar -zxvf phylip-3.696.tar.gz
       $ cd phylip-3.696
       $ cd src/
       $ cp Makefile.unx Makefile
       $ make install

	3. Modeller (version 9.14 )
       https://salilab.org/modeller/
       
       MODELLER is used for homology or comparative modeling of protein three-dimensional structures.
       The user provides an alignment of a sequence to be modeled with known related structures and 
       MODELLER automatically calculates a model containing all non-hydrogen atoms. MODELLER 
       is available for download for most Unix/Linux systems, Windows, and Mac.

	4. NMRPipe/TALOS 
       http://spin.niddk.nih.gov/NMRPipe/

       NMRPipe is an extensive software system for processing, analyzing, and exploiting NMR spectroscopic 
       data. 

       wget http://spin.niddk.nih.gov/NMRPipe/install/download/install.com
       wget http://spin.niddk.nih.gov/NMRPipe/install/download/binval.com
       wget http://spin.niddk.nih.gov/NMRPipe/install/download/NMRPipeX.tZ
       wget http://spin.niddk.nih.gov/NMRPipe/install/download/talos.tZ
       wget http://spin.niddk.nih.gov/NMRPipe/install/download/dyn.tZ

       chmod a+r  *.tZ 
       chmod u+rx *.com
       ./install.com +dest /home/cmadrid/software/talos



DOWNLOAD AND INSTALL INSTRUCTIONS FOR OTHER REQUIREMENTS: 
	
    1. MySQL Smotif database 
       MySQL Smotif database is freely available from: 
       http://fiserlab.org/SmotifCS/vilas_loop_pred.sql.gz
	
	
    *** How to install a local copy of the Smotif Database ***
    
    - Log into the server running MySQL
    
    - Download Smotif database from http://fiserlab.org/SmotifCS/vilas_loop_pred.sql.gz
        and save it to /tmp directory
        
    - Uncompress vilas_loop_pred.sql.gz  
        cd /tmp/
        tar -zxvf vilas_loop_pred.sql.gz 
    
    - Connect to the MySQL server and 
        create a database named vilas_loop_pred
        
        $ mysql -u root -h localhost  -p 'your_mysql_root_password'
        
        Welcome to the MySQL monitor.  Commands end with ; or \g.
        Your MySQL connection id is 2844
        Server version: 5.1.73 Source distribution

        Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

        Oracle is a registered trademark of Oracle Corporation and/or its
        affiliates. Other names may be trademarks of their respective
        owners.

        Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

        mysql> 
        mysql> create database vilas_loop_pred;
        mysql> quit

    - Load vilas_loop_pred database
        $ mysql -u root -h localhost -p 'your_mysql_root_password' vilas_loop_pred < vilas_loop_pred

    - Connect to the MySQl sever and 
        create a user with read access to vilas_loop_pred database
        
        mysql> GRANT SELECT ON vilas_loop_pred.* TO 'my_user'@'client_computer_where_you_will_run_Smotifcs' INDENTIFIED BY 'my_password';
        
        mysql> FLUSH PRIVILEGES;


	2. Smotif chemical shift library and related files 
	   Smotif chemical shift library and related files is freely available from:
       http://fiserlab.org/SmotifCS/chemical_shift.tar.gz

    *** How to install a local copy of Smotif chemical shift library and related files ***

		- Download chemical shift database from http://fiserlab.org/SmotifCS/chemical_shift.tar.gz
          and save it to /tmp directory

        - Uncompress chemical_shift.tar.gz and move it to /usr/local/databases


INSTALLATION (SmotifCS )

    To install this SmotifCS-0.01, run the following commands:
       
       
      1. Manually:
        Install where standrard Perl are stored
        
        tar -zxvf SmotifCS-0.01.tar.gz
        cd SmotifCS-0.01/
        perl Makefile.PL
        make
        make test
        make install


      2. Install in a custom location (/home/user/MyPerlLib)
        
        tar -zxvf SmotifCS-0.01.tar.gz
        cd SmotifCS-0.01/
        perl Makefile.PL PREFIX=/home/user/MyPerlLib/
        make
        make test
        make install
        
        
       3. Using CPAN clients:
        
        perl -MCPAN -e shell
        > conf makepl_arg PREFIX=/home/user/MyPerlLib/
        > install SmotifCS 


HOW TO RUN THE SmotifCS HYBRID MODELING ALGORITHM: 
    
    INITIALIZE THE CONFIGURATION FILE: 
    
    Set all paths, directories and executables for required software 
    in SmotifCS-0.01/smotifcs_config.ini

    Set environment varible in .bashrc file:
    export SMOTIFCS_CONFIG_FILE=/home/user/SmotifCS-0.01/smotifcs_config.ini

    The README is used to introduce the module and provide instructions on
    how to install the module, any machine dependencies it may have (for
    example C compilers and installed libraries) and any other information
    that should be provided before the module is installed.

    A README file is required for CPAN modules since CPAN extracts the README
    file from a module distribution so that people browsing the archive
    can use it to get an idea of the module's uses. It is usually a good idea
    to provide version information here so that people can decide whether
    fixes for the module are worth downloading.



Set up the configuration file:

    The configuration file, smotifcs_config.ini has all the information
    regarding the required library files and other pre-requisite software. 

    Set all the paths and executables in this file correctly.

    Set environment varible in .bashrc file:

    export SMOTIFCS_CONFIG_FILE=/home/user/SmotifCS-0.01/smotifcs_config.ini


MODELING PROTEINS USING A SUPER-SECONDARY STRUCTURE LIBRARY AND NMR CHEMICAL SHIFT INFORMATION

    # SMOTIFCS implement a hybrid preotein modeling algorithms that relies on an exhaustive Smotif library 
    # and on easily obtainable NMR experiemtal data

    SMOTIFCS implement a hybrid protein modeling algorithms that relies on a library of protein 
    super-secondary structure motifs (Smotifs) and easily obtainable NMR experimental data.


	MODELING ALGORITHM STEPS: 

    Step 1:				             
    Run Talos+			             
    Get SS, Phi/PSi, Smotif Information (Single-core task)		     	             

        Usage: 
        perl smotifcs.pl --step=1 --pdb=1zzz  --chain=A --havestructure=0	     


    Step 2:                                             
    Compare experimental CS of Query SmotifS to theoretical CS of library Smotifs         
    (Multi-core task/ cluster job)		     

        Usage: 
        perl smotifcs.pl --step=2 --pdb=1zzz  --chain=A --havestructure=0           


    Step 3:                                             
    Cluster and rank chosen SmotifS (Multi-core task/ cluster job)                       

        Usage: 
        perl smotifcs.pl --step=3 --pdb=1zzz  --chain=A --havestructure=0           


    Step 4:                                             
    Enumerate all possible combinations of  Smotifs	(about a million models)	     
    (Multi-core task/ cluster job)                       

        Usage: 
        perl smotifcs.pl --step=4 --pdb=1zzz --chain=A --havestructure=0           


    Step 5:                                             
    Rank enumerated structures using a composite energy function  (Single-core task)                   

        Usage: 
        perl smotifcs.pl --step=5 --pdb=1zzz  --chain=A --havestructure=0           

        
    Step 6:                                             
    Run Modeller to generate top 5 complete models  (Single-core task)                     	       
        
        Usage: 
        perl smotifcs.pl --step=6 --pdb=1zzz --chain=A --havestructure=0           



How to run the program:

1. Create a subdirectory with a dummy pdb_id lile name (eg: 1abc or 1zzz). 

2. Put the chemical shift input file (in BMRB format) in this directory.
   Use the filename 1abc/pdb1abcshifts.dat or 1zzz/pdb1zzzshifts.dat for
   the BMRB formatted chemical shift input file. 

3. Optional: If structure is known, include a pdb format structure file
   in the same directory. 1abc/pdb1abc.ent or 1zzz/pdb1zzz.ent

4. Run steps 1 to 6 as given above sequentially. Output from previous
   steps are often required in subsequent steps. Wait for each step to
   be completed without errors before going to the next step. 

5. To run all steps together use: 
   perl smotifcs.pl --step=all --pdb=1zzz --chain=A --havestructure=0

6. Use multiple-cores or clusters as available, for steps 2, 3 & 4.  
   These are slow and require a lot of computational resources. 

7. If structure is known, use --havestructure=1.
   Else, use --havestructure=0 in all the steps. 

Results: 

Top 5 models are stored in the subdirectory (1abc or 1zzz) as:
Model.1.pdb, Model.2.pdb, Model.3.pdb, Model.4.pdb & Model.5.pdb	




Reference: 

Menon V, Vallat BK, Dybas JM, Fiser A.
Modeling proteins using a super-secondary structure library and NMR chemical
shift information.
Structure, 2013, 21(6):891-9.

Authors:

Vilas Menon, Brinda Vallat, Joe Dybas, Carlos Madrid and Andras Fiser. 


CentOS release 6.6 (Final)
Centos release 7


SUPPORT AND DOCUMENTATION

After installing, you can find documentation for this module with the
perldoc command.

    perldoc SmotifCS

You can also look for information at:

    RT, CPAN's request tracker (report bugs here)
        http://rt.cpan.org/NoAuth/Bugs.html?Dist=SmotifCS

    AnnoCPAN, Annotated CPAN documentation
        http://annocpan.org/dist/SmotifCS

    CPAN Ratings
        http://cpanratings.perl.org/d/SmotifCS

    Search CPAN
        http://search.cpan.org/dist/SmotifCS/


LICENSE AND COPYRIGHT

Copyright (C) 2015 Fiserlab Members 

This program is free software; you can redistribute it and/or modify it
under the terms of the the Artistic License (2.0). You may obtain a
copy of the full license at:

L<http://www.perlfoundation.org/artistic_license_2_0>

Any use, modification, and distribution of the Standard or Modified
Versions is governed by this Artistic License. By using, modifying or
distributing the Package, you accept this license. Do not use, modify,
or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made
by someone other than you, you are nevertheless required to ensure that
your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service
mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge
patent license to make, have made, use, offer to sell, sell, import and
otherwise transfer the Package with respect to any patent claims
licensable by the Copyright Holder that are necessarily infringed by the
Package. If you institute patent litigation (including a cross-claim or
counterclaim) against any party alleging that the Package constitutes
direct or contributory patent infringement, then this Artistic License
to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER
AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES.
THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY
YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR
CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR
CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

