diff --git a/PhyloToL-Part-2.md b/PhyloToL-Part-2.md index a7b7bba..ea476bb 100644 --- a/PhyloToL-Part-2.md +++ b/PhyloToL-Part-2.md @@ -12,4 +12,22 @@ We provide a diverse database of 1,000 genomes and transcriptomes from across th # Contamination loop -The Contamination Loop is implemented within PhyloToL to allow the removal of contaminants based on the topology of each tree (= Phylogenetic based contamination removal). 3 modes are available and described in this section: ‘sister’, ‘subsister’ and ‘clade’. All modes take a user defined rules file to identify the sequences to remove. +The contamination coop (CL) is implemented within PhyloToL to allow the removal of contaminants based on the topology of each tree (phylgoeny-informed contamination removal). Three modes are available: sister-, subsister-, and clade-based contamination removal. All modes take a user defined file of 'rules,' used to identify the sequences to remove. + +Sisters-based contamination removal identifies sequences as putative contaminants based on their sister relationships. If a sequence from sample A appears on a tree sister to a sequence from sample B, and sample B is known to have contaminated sample A, then the sequence from sample A will be removed. Subsisters-based removal operates similarly, but looks at the taxa that are sister to sample A's _parent_ node, useful for when multiple samples are contaminated by the same other sample. + +Clade-based contamination removal operates differently. In this mode, the CL searches for monophyletic clades in each gene tree that match a set of given criteria. For example, if we want to 'clade-grab' for robust Opisthokont clades, we might choose to keep only Opisthokont sequences that fall in a monophyletic clade of 12 or more species of Opisthokont; all other Opisthokont sequences in the tree would be removed. + +## Setup + +The CL requires 1) a folder of alignments (not gap trimmed) and 2) a folder of gene trees in order to run, and they should be formatted in the same way as output by the preceding steps of PhyloToL part 2 (i.e. in the "Output" folder, see above). You can also give it data _not_ output by PhyloToL, but you will need to match the folder, file, and sequence name formats. + +You will also need to create a 'rules' file. The format here varies between the different modes of the CL. + +## Running + + + + + +