From 02d8fdb3aad10da99703da50bb7b4105ed6ec6dc Mon Sep 17 00:00:00 2001 From: Katzlab Date: Sat, 10 Aug 2024 04:51:58 -0400 Subject: [PATCH] Updated PhyloToL Part 2 (markdown) --- PhyloToL-Part-2.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/PhyloToL-Part-2.md b/PhyloToL-Part-2.md index 38e362c..649dd00 100644 --- a/PhyloToL-Part-2.md +++ b/PhyloToL-Part-2.md @@ -1,16 +1,20 @@ # Overview and Modularity +# Set Up + # Databases We provide a diverse database of 1,000 genomes and transcriptomes from across the eukaryotic, bacterial, and archaeal tree of life, with a focus on microeukaryotic diversity. This database is in the form of "ReadyToGo" files, the output of PhyloToL part 1. This means that using this dataset, you can jump right in to running analyses of any subset of these taxa using any of the OGs in the Hook Database. If you want to add your own samples or use a different set of OGs, you should check out [PhyloToL part 1](https://github.com/Katzlab/PhyloToL-6/wiki/PhyloToL-Part-1). -# Overlap and similarity filters +# Running PhyloToL Part 2 -# Guidance +## Overlap and similarity filters -# Gene trees +## Guidance -# Contamination loop +## Gene trees + +## Contamination loop The contamination coop (CL) is implemented within PhyloToL to allow the removal of contaminants based on the topology of each tree (phylgoeny-informed contamination removal). Three modes are available: sister-, subsister-, and clade-based contamination removal. All modes take a user defined file of 'rules,' used to identify the sequences to remove. @@ -18,7 +22,8 @@ Sisters-based contamination removal identifies sequences as putative contaminant Clade-based contamination removal operates differently. In this mode, the CL searches for monophyletic clades in each gene tree that match a set of given criteria. For example, if we want to 'clade-grab' for robust Opisthokont clades, we might choose to keep only Opisthokont sequences that fall in a monophyletic clade of 12 or more species of Opisthokont; all other Opisthokont sequences in the tree would be removed. -The CL runs iteratively, _meaning_... +The CL runs iteratively and users must set the number of times that rules should be applied to reconstructed trees. Starting with a set of trees and a list of rules (i.e. a sequence from a ciliate to be removed if it falls sister to a known food source), PhyloToL will: identify a list of sequences as contaminants (writing them out to xxxx file), generate a fasta file for each gene family without contaminating sequences, reconstruct an alignment using ?Guidance with x iterations?, and generate a new tree. The default setting is to run the CL for 5 loops, and users can inspect outputs to determine optimal number for their study. + ## Contamination loop setup