mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-29 16:20:24 +08:00
Updated PhyloToL Part 2: MSAs, trees, and contamination loop (markdown)
parent
1b9e337b94
commit
18353f9a63
@ -74,6 +74,31 @@ Minimum requirements:
|
|||||||
List of all parameters included in PhyloToL:
|
List of all parameters included in PhyloToL:
|
||||||
|
|
||||||
|
|
||||||
|
Argument | Default | Choices | Help
|
||||||
|
-- | -- | -- | --
|
||||||
|
--start | raw | raw, unaligned, aligned, trees | Stage at which to start running PhyloToL.
|
||||||
|
--end | trees | unaligned, aligned, trees | Stage until which to run PhyloToL. Options are "unaligned" (up to but not including guidance), "aligned" (up to but not including RAxML), and "trees" which will run through RAxML.
|
||||||
|
--gf_list | None | | Path to the file with the GFs of interest. Only required if starting from the raw dataset.
|
||||||
|
--taxon_list | None | | Path to the file with the taxa (10-digit codes) to include in the output.
|
||||||
|
--data | | | Path to the input dataset. The format varies depending on your --start parameter. If running the contamination loop starting with trees, this folder must include both trees AND a fasta file for each tree (with identical file names other than the extension) that includes an amino-acid sequence for each tip of the tree (with matching sequence names).
|
||||||
|
--output | ./ | | Directory where the output folder should be created. If not given, the folder will be created in the parent directory of the folder containing the scripts.
|
||||||
|
--force | store_true | | Overwrite all existing files in the "Output" folder.
|
||||||
|
--tree_method | iqtree | iqtree, raxml, all | Program to use for tree-building.
|
||||||
|
--blacklist | | | A text file with a list of sequence names not to consider.
|
||||||
|
--og_identifier | OG | OG, OG6, OGA, OGG | Program to use for selecting sequences by GC width.
|
||||||
|
--sim_taxa | None | | Path to the file with the taxa (10-digit codes) to apply the similarity filter on.
|
||||||
|
--blast_cutoff | 1e-20 | | Blast e-value cutoff.
|
||||||
|
--len_cutoff | 10 | | Amino acid length cutoff for removal of very short sequences after column removal in Guidance.
|
||||||
|
--similarity_filter | store_true | | Run the similarity filter in pre-Guidance.
|
||||||
|
--sim_cutoff | 1 | | Sequences from the same taxa that are assigned to the same OG are removed if they are more similar than this cutoff.
|
||||||
|
--guidance_iters | 5 | | Number of Guidance iterations for sequence removal.
|
||||||
|
--seq_cutoff | 0.3 | | During guidance, taxa are removed if their score is below this cutoff.
|
||||||
|
--col_cutoff | 0.0 | | During guidance, columns are removed if their score is below this cutoff.
|
||||||
|
--res_cutoff | 0.0 | | During guidance, residues are removed if their score is below this cutoff.
|
||||||
|
--keep_temp | store_true | | Use this to keep ALL Guidance intermediate files.
|
||||||
|
--keep_iter / -z | store_true | | Keep all Guidance iterations (beware this will be very large)
|
||||||
|
|
||||||
|
|
||||||
## Contamination loop
|
## Contamination loop
|
||||||
|
|
||||||
The contamination coop (CL) is implemented within PhyloToL to allow the removal of contaminants based on the topology of each tree (phylgoeny-informed contamination removal). Three modes are available: sister-, subsister-, and clade-based contamination removal. All modes take a user defined file of 'rules,' used to identify the sequences to remove.
|
The contamination coop (CL) is implemented within PhyloToL to allow the removal of contaminants based on the topology of each tree (phylgoeny-informed contamination removal). Three modes are available: sister-, subsister-, and clade-based contamination removal. All modes take a user defined file of 'rules,' used to identify the sequences to remove.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user