mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-28 03:00:24 +08:00
Updated PhyloToL Part 2: MSAs, trees, and contamination loop (markdown)
parent
2b35b2edcc
commit
95979d4a74
@ -59,14 +59,14 @@ You will also need to give PhyloToL part 2 a list of all of the sample identifie
|
||||
|
||||
Below is a list of basic PhyloToL part 2 parameters:
|
||||
|
||||
Argument | Default | Choices | Description
|
||||
Argument | Default value | Options | Description
|
||||
-- | -- | -- | --
|
||||
--start | raw | raw, unaligned, aligned, trees | Stage at which to start running PhyloToL.
|
||||
--end | trees | unaligned, aligned, trees | Stage until which to run PhyloToL. Options are "unaligned" (up to but not including guidance), "aligned" (up to but not including RAxML), and "trees" which will run through RAxML.
|
||||
--gf_list | None | | Path to the file with the GFs of interest. Only required if starting from the raw dataset.
|
||||
--taxon_list | None | | Path to the file with the taxa (10-digit codes) to include in the output.
|
||||
--data | | | Path to the input dataset. The format varies depending on your --start parameter. If running the contamination loop starting with trees, this folder must include both trees AND a fasta file for each tree (with identical file names other than the extension) that includes an amino-acid sequence for each tip of the tree (with matching sequence names).
|
||||
--output | ./ | | Directory where the output folder should be created. If not given, the folder will be created in the parent directory of the folder containing the scripts.
|
||||
--gf_list | No default | Any valid path | Path to the file with the GFs of interest. Only required if starting from the raw dataset.
|
||||
--taxon_list | No default | Any valid path | Path to the file with the taxa (10-digit codes) to include in the output.
|
||||
--data | No default | Any valid path | Path to the input dataset. The format varies depending on your --start parameter. If running the contamination loop starting with trees, this folder must include both trees AND a fasta file for each tree (with identical file names other than the extension) that includes an amino-acid sequence for each tip of the tree (with matching sequence names).
|
||||
--output | Current directory | Any valid path | Directory where the output folder should be created. If not given, the folder will be created in the parent directory of the folder containing the scripts.
|
||||
|
||||
Optional arguments can then be added to the base command, and will be described below. In the following is described each stage of PhyloToL, and some key parameters to know for each step.
|
||||
|
||||
@ -93,9 +93,9 @@ Another option to filter sequences from the ReadyToGo files at the pre-guidance
|
||||
Argument | Default | Choices | Help
|
||||
-- | -- | -- | --
|
||||
--og_identifier | OG | OG, OG6, OGA, OGG | Program to use for selecting sequences by GC width.
|
||||
--similarity_filter | store_true | | Run the similarity filter in pre-Guidance.
|
||||
--sim_cutoff | 1 | _float_ | Sequences from the same taxa that are assigned to the same OG are removed if they are more similar (% amino acid identity over 20% of their length) than this cutoff.
|
||||
--sim_taxa | None | Path to file | Path to the file with the taxa (10-digit codes) to apply the similarity filter on.
|
||||
--similarity_filter | flag (true/false) | include or exclude the argument | Run the similarity filter in pre-Guidance.
|
||||
--sim_cutoff | 1 | Any number between zero and one | Sequences from the same taxa that are assigned to the same OG are removed if they are more similar (% amino acid identity over 20% of their length) than this cutoff.
|
||||
--sim_taxa | No default | Any valid path | Path to the file with the taxa (10-digit codes) to apply the similarity filter on.
|
||||
|
||||
Adding these options to the command line will give:
|
||||
|
||||
@ -110,7 +110,7 @@ The blacklist is a user-defined set of sequences to be removed from runs. You mi
|
||||
|
||||
Argument | Default | Choices | Help
|
||||
-- | -- | -- | --
|
||||
--blacklist | None | Path to a file | A text file with a list of sequence names not to consider.
|
||||
--blacklist | No default | Any valid path | Path to a text file with a list of sequence names not to consider.
|
||||
|
||||
## Guidance
|
||||
|
||||
@ -118,12 +118,12 @@ Within PhyloToL part 2, we use Guidance to assess homology within gene families.
|
||||
|
||||
Argument | Default | Choices | Description
|
||||
-- | -- | -- | --
|
||||
--guidance_iters | 5 | _int_ | Number of Guidance iterations for sequence removal.
|
||||
--seq_cutoff | 0.3 | _float_ | During guidance, taxa are removed if their score is below this cutoff.
|
||||
--col_cutoff | 0.0 | _float_ | During guidance, columns are removed if their score is below this cutoff.
|
||||
--res_cutoff | 0.0 | _float_ | During guidance, residues are removed if their score is below this cutoff.
|
||||
--keep_temp | True | flag | Use this to keep ALL Guidance intermediate files.
|
||||
--keep_iter / -z | True | flag | Keep all Guidance iterations (beware this will be very large)
|
||||
--guidance_iters | 5 | Any positive integer | Number of Guidance iterations for sequence removal.
|
||||
--seq_cutoff | 0.3 | Any number between 0 and 1 | During guidance, taxa are removed if their score is below this cutoff.
|
||||
--col_cutoff | 0.0 | Any number between 0 and 1 | During guidance, columns are removed if their score is below this cutoff.
|
||||
--res_cutoff | 0.0 | Any number between 0 and 1 | During guidance, residues are removed if their score is below this cutoff.
|
||||
--keep_temp | False | include or exclude the argument | Use this to keep ALL Guidance intermediate files.
|
||||
--keep_iter / -z | False | include or exclude the argument | Keep all Guidance iterations (beware this will be very large)
|
||||
|
||||
## Gene trees
|
||||
|
||||
@ -177,11 +177,11 @@ To run the CL, use a similar command structure as described for running PhyloToL
|
||||
| Parameter | Required | Options | Description | Default |
|
||||
| ------------- | ------------- | ------------- | ------------- | ------------- |
|
||||
| --contamination_loop | yes | seq, clade | The mode in which to run the CL | none |
|
||||
| --nloops | no | _int_ | Number of iterations | 5 |
|
||||
| --sister_rules | in sisters mode | Path to a file | Sisters rules file | none |
|
||||
| --subsister_rules | in subsisters mode | Path to a file | Subsisters rules file | none |
|
||||
| --clade_grabbing_rules | in clade mode | Path to a file | Clade-grabbing rules file | none |
|
||||
| --clade_grabbing_exceptions | no | Path to a file | List of taxa to _not_ remove for any reason | none |
|
||||
| --nloops | no | any positive integer | Number of iterations | 5 |
|
||||
| --sister_rules | only in sisters mode | Any valid path | Path to a text file containing 'sisters rules' | none |
|
||||
| --subsister_rules | only in subsisters mode | Any valid path | Path to a text file containing 'subsisters rules' | none |
|
||||
| --clade_grabbing_rules | only in clade mode | Any valid path | Path to a text file containing 'clade-grabbing rules' | none |
|
||||
| --clade_grabbing_exceptions | no | Any valid path | List of taxa to _not_ remove for any reason | none |
|
||||
|
||||
## Ortholog selection and concatenation
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user