mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-28 04:50:25 +08:00
130 lines
6.6 KiB
Plaintext
130 lines
6.6 KiB
Plaintext
GUIDANCE: GUIDe tree based AligNment ConfidencE
|
|
|
|
GUIDANCE is a software package for aligning biological sequences (DNA or
|
|
amino acids) using either MAFFT, PRANK, or CLUSTALW, and calculating
|
|
confidence scores for each column, sequence and residue in the alignment.
|
|
|
|
URL: http://guidance.tau.ac.il/
|
|
|
|
Authors: Osnat Penn, Eyal Privman, Haim Ashkenazy, Itamar Sela, Giddy Landan, Dan Graur, and Tal Pupko.
|
|
|
|
When using the GUIDANCE2 algorithm please cite:
|
|
-----------------------------------------------------------
|
|
Sela, I., Ashkenazy, H., Katoh, K. and Pupko, T. (2015)
|
|
GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters.
|
|
Nucleic Acids Research, 2015 Jul 1; 43: W7-W14.; doi: 10.1093/nar/gkq443
|
|
|
|
Landan, G., and D. Graur. (2008).
|
|
Local reliability measures from sets of co-optimal multiple sequence alignments.
|
|
Pac Symp Biocomput 13:15-24
|
|
|
|
When using the GUIDANCE algorithm please cite:
|
|
-----------------------------------------------------------
|
|
Penn, O., Privman, E., Landan, G., Graur, D. and Pupko, T. (2010).
|
|
An alignment confidence score capturing robustness to guide-tree uncertainty.
|
|
Molecular Biology and Evolution, 2010 Aug;27(8):1759-67; doi:10.1093/molbev/msq066
|
|
|
|
When using the HoT algorithm please cite:
|
|
-----------------------------------------------------------
|
|
Landan, G., and D. Graur. (2008).
|
|
Local reliability measures from sets of co-optimal multiple sequence alignments.
|
|
Pac Symp Biocomput 13:15-24
|
|
|
|
Installation
|
|
============
|
|
|
|
1. Unpack the archive by typing:
|
|
% tar -xzf guidance.v2.01.tgz
|
|
|
|
2. Compile the package by typing:
|
|
% cd guidance.v2.01
|
|
% make
|
|
(Running `make' takes a while)
|
|
|
|
3. Check if you have the desired alignment program installed:
|
|
|
|
MAFFT: Type "mafft" and check that you have version 6.712 or newer.
|
|
* Else download and install MAFFT from: http://mafft.cbrc.jp/alignment/software/
|
|
|
|
PRANK: Type "prank" and check that you have it insalled
|
|
* Else download and install PRANK from: http://www.ebi.ac.uk/goldman-srv/prank/prank/
|
|
|
|
CLUSTALW: Type "clustalw" and check that you have it insalled
|
|
* Else download and install CLUSTALW from: http://www.ebi.ac.uk/Tools/clustalw2/index.html
|
|
|
|
MUSCLE: Type "muscle" and check that you have it insalled
|
|
* Else download and install MUSCLE from: http://www.drive5.com/muscle/index.htm
|
|
|
|
PAGAN: Type "pagan" and check that you have it installed
|
|
* Else download and install PAGAN from: http://code.google.com/p/pagan-msa/
|
|
* In case PAGAN is used not for user provided alignment (using --msaFile), mafft is also required to be installed.
|
|
|
|
|
|
4. GUIDANCE also uses Perl, BioPerl and Ruby:
|
|
* Type "perl -v" and check that you Perl installed.
|
|
Else download and install it from: http://www.perl.org/
|
|
* Type "perl -e 'use Bio::SeqIO'" to check that you have BioPerl.
|
|
Else download and install it from: http://www.bioperl.org/
|
|
* Type "ruby -version" to check that you have ruby.
|
|
Else download and install it from: http://www.ruby-lang.org/en/
|
|
|
|
Usage
|
|
=====
|
|
|
|
Run the Perl script: guidance.v2.01/www/Guidance/guidance.pl
|
|
(Note that you cannot move this script out of its directory, because it uses relative paths to other files in other directories. Sorry)
|
|
GUIDANCE uses flags in the command line arguments: (for help, type: "perl guidance")
|
|
|
|
USAGE:perl guidance.pl --seqFile SEQFILE --msaProgram [MAFFT|PRANK|CLUSTALW|MUSCLE] --seqType [aa|nuc|codon] --outDir FULL_PATH_OUTDIR
|
|
|
|
Required parameters:
|
|
--seqFile Input sequence file in FASTA format
|
|
--seqType Sequence type may be either of: nuc (nucleotides), aa (amino acids),
|
|
or codon (nucleotides that will be treated as whole codons)
|
|
--msaProgram The alignment program - may be either MAFFT, PRANK, CLUSTALW or MUSCLE
|
|
--outDir The output directory were all output files will be created [please provide full and not relative path]
|
|
(will be created automatically)
|
|
|
|
Optional parameters:
|
|
--program The confidence measure may be GUIDANCE2, GUIDANCE or HoT. default=GUIDANCE2
|
|
--bootstraps Number of bootstrap iterations. default=100
|
|
--genCode Genetic code for use in codon sequence. default=1
|
|
1> Nuclear Standard
|
|
15> Nuclear Blepharisma
|
|
6> Nuclear Ciliate
|
|
10> Nuclear Euplotid
|
|
2> Mitochondria Vertebrate
|
|
5> Mitochondria Invertebrate
|
|
3> Mitochondria Yeast
|
|
13> Mitochondria Ascidian
|
|
9> Mitochondria Echinoderm
|
|
14> Mitochondria Flatworm
|
|
4> Mitochondria Protozoan
|
|
--outOrder May be either aligned or as_input. default=aligned
|
|
--msaFile Input alignment file - not recommended, see documentation online at: guidance.tau.ac.il
|
|
--seqCutoff Sequence confidence cutoff between 0 to 1. default=0.6
|
|
--colCutoff Columnd confidence cutoff between 0 to 1. default=0.93
|
|
--mafft Path to mafft executable. default=mafft
|
|
--prank Path to prank executable. default=prank
|
|
--clustalw path to clustalw executable. default=clustalw
|
|
--muscle path to muscle executable. default=muscle
|
|
--pagan path to pagan executable. default=pagan
|
|
--ruby path to ruby executable. default=ruby
|
|
--dataset Unique name for the Dataset - will be used as prefix to outputs (default=MSA)
|
|
--MSA_Param: Passing parameters for the alignment program e.g -F to prank. To pass parameter containning '-' in it, add \ before each '-' e.g. \-F for PRANK
|
|
--proc_num: Number of processors to use (default=1)
|
|
|
|
EXAMPLES:
|
|
|
|
>perl guidance.pl --seqFile protein.fas --msaProgram MAFFT --seqType aa --outDir /somedir/protein.guidance
|
|
Will align the amino acid sequences in the fasta file "protein.fas" using MAFFT and output all results to the diretory "/somedir/protein.guidance"
|
|
|
|
>perl guidance.pl --seqFile codingSeq.fas --msaProgram PRANK --seqType codon --outDir /somedir/codingSeq.guidance --genCode 2 --bootstraps 30
|
|
Will align the codon sequences in the fasta file "codingSeq.fas" using PRANK after translation using the vertebrate mitochondrial genetic code and output all results to the diretory "/somedir/codingSeq.guidance". Only 30 bootstrap iterations will be done instead of the default 100 (cut run-time by a factor of 3)
|
|
|
|
Copyrights
|
|
==========
|
|
|
|
* To modify the code, or use parts of it for other purposes, permission should be requested. Please contact Tal Pupko: talp@post.tau.ac.il
|
|
* Please note that the use of the GUIDANCE program is for academic use only
|