mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-29 03:00:26 +08:00
Updated PhyloToL Part 2 (markdown)
parent
38e32534ba
commit
0517d7c1f2
@ -12,4 +12,22 @@ We provide a diverse database of 1,000 genomes and transcriptomes from across th
|
|||||||
|
|
||||||
# Contamination loop
|
# Contamination loop
|
||||||
|
|
||||||
The Contamination Loop is implemented within PhyloToL to allow the removal of contaminants based on the topology of each tree (= Phylogenetic based contamination removal). 3 modes are available and described in this section: ‘sister’, ‘subsister’ and ‘clade’. All modes take a user defined rules file to identify the sequences to remove.
|
The contamination coop (CL) is implemented within PhyloToL to allow the removal of contaminants based on the topology of each tree (phylgoeny-informed contamination removal). Three modes are available: sister-, subsister-, and clade-based contamination removal. All modes take a user defined file of 'rules,' used to identify the sequences to remove.
|
||||||
|
|
||||||
|
Sisters-based contamination removal identifies sequences as putative contaminants based on their sister relationships. If a sequence from sample A appears on a tree sister to a sequence from sample B, and sample B is known to have contaminated sample A, then the sequence from sample A will be removed. Subsisters-based removal operates similarly, but looks at the taxa that are sister to sample A's _parent_ node, useful for when multiple samples are contaminated by the same other sample.
|
||||||
|
|
||||||
|
Clade-based contamination removal operates differently. In this mode, the CL searches for monophyletic clades in each gene tree that match a set of given criteria. For example, if we want to 'clade-grab' for robust Opisthokont clades, we might choose to keep only Opisthokont sequences that fall in a monophyletic clade of 12 or more species of Opisthokont; all other Opisthokont sequences in the tree would be removed.
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
The CL requires 1) a folder of alignments (not gap trimmed) and 2) a folder of gene trees in order to run, and they should be formatted in the same way as output by the preceding steps of PhyloToL part 2 (i.e. in the "Output" folder, see above). You can also give it data _not_ output by PhyloToL, but you will need to match the folder, file, and sequence name formats.
|
||||||
|
|
||||||
|
You will also need to create a 'rules' file. The format here varies between the different modes of the CL.
|
||||||
|
|
||||||
|
## Running
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user