Updated EukPhylo Part 2: MSAs, trees, and contamination loop (markdown)

Adri K. Grow 2025-08-19 16:47:29 -04:00
parent 5f100177d4
commit 4cc45c68db

@ -145,7 +145,7 @@ NOTE: These processes are resource-intensive. Each system has its own syntax and
## Contamination loop
The contamination coop (CL) is implemented within EukPhylo to allow the removal of contaminants based on the topology of each tree (phylogeny-informed contamination removal). Three modes are available: sister-, subsister-, and clade-based contamination removal. All modes take a user defined file of 'rules,' used to identify the sequences to remove. We first provide an overview of the three modes and then give details on running below.
The contamination loop (CL) is implemented within EukPhylo to allow the removal of contaminants based on the topology of each tree (phylogeny-informed contamination removal). Three modes are available: sister-, subsister-, and clade-based contamination removal. All modes take a user defined file of 'rules,' used to identify the sequences to remove. We first provide an overview of the three modes and then give details on running below.
**Sisters-based contamination removal** identifies sequences as putative contaminants based on their sister relationships. If a sequence from sample A appears on a tree sister to a sequence from sample B, and sample B is known to have contaminated sample A, then the sequence from sample A will be removed. **Subsisters-based removal** operates similarly, but looks at the taxa that are sister to sample A's _parent_ node, useful for when multiple samples are contaminated by the same other sample.