Updated PhyloToL Part 2 (markdown)

Katzlab 2024-08-09 17:40:09 -04:00
parent 7143ae2d57
commit 44b8546612

@ -18,20 +18,26 @@ Sisters-based contamination removal identifies sequences as putative contaminant
Clade-based contamination removal operates differently. In this mode, the CL searches for monophyletic clades in each gene tree that match a set of given criteria. For example, if we want to 'clade-grab' for robust Opisthokont clades, we might choose to keep only Opisthokont sequences that fall in a monophyletic clade of 12 or more species of Opisthokont; all other Opisthokont sequences in the tree would be removed. Clade-based contamination removal operates differently. In this mode, the CL searches for monophyletic clades in each gene tree that match a set of given criteria. For example, if we want to 'clade-grab' for robust Opisthokont clades, we might choose to keep only Opisthokont sequences that fall in a monophyletic clade of 12 or more species of Opisthokont; all other Opisthokont sequences in the tree would be removed.
The CL runs iteratively, _meaning_...
## Setup ## Setup
The CL requires 1) a folder of alignments (not gap trimmed) and 2) a folder of gene trees in order to run, and they should be formatted in the same way as output by the preceding steps of PhyloToL part 2 (i.e. in the "Output" folder, see above). You can also give it data _not_ output by PhyloToL, but you will need to match the folder, file, and sequence name formats. The CL requires 1) a folder of alignments (not gap trimmed) and 2) a folder of gene trees in order to run, and they should be formatted in the same way as output by the preceding steps of PhyloToL part 2 (i.e. in the "Output" folder, see above). You can also give it data _not_ output by PhyloToL, but you will need to match the folder, file, and sequence name formats.
You will also need to create a 'rules' file. The format here varies between the different modes of the CL. You will also need to create a 'rules' file. The format here varies between the different modes of the CL.
_describe rules files here_
## Running ## Running
To run the CL, use a similar command structure as described for running PhyloToL part 2 above, and add the `--contamination_loop` parameter to activate the contamination loop and specify a mode and the path to a rules file. Available parameters are: To run the CL, use a similar command structure as described for running PhyloToL part 2 above, and add the `--contamination_loop` parameter to activate the contamination loop and specify a mode and the path to a rules file. Available parameters are:
| Parameter | required | Options | Description | default | | Parameter | Required | Options | Description | Default |
| ------------- | ------------- | ------------- | ------------- | ------------- | | ------------- | ------------- | ------------- | ------------- | ------------- |
| --contamination_loop | yes | seq, clade | The mode in which to run the loop | none | | --contamination_loop | yes | seq, clade | The mode in which to run the CL | none |
| --nloops | no | _int_ | Number of iterations | 5 | | --nloops | no | _int_ | Number of iterations | 5 |
| --sister_rules | in sisters mode | Path to a file | Sisters rules file | none |
| --subsister_rules | in subsisters mode | Path to a file | Subsisters rules file | none |