mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-27 21:20:31 +08:00
Updated PhyloToL Part 2 (markdown)
parent
c550787a16
commit
02d8fdb3aa
@ -1,16 +1,20 @@
|
||||
# Overview and Modularity
|
||||
|
||||
# Set Up
|
||||
|
||||
# Databases
|
||||
|
||||
We provide a diverse database of 1,000 genomes and transcriptomes from across the eukaryotic, bacterial, and archaeal tree of life, with a focus on microeukaryotic diversity. This database is in the form of "ReadyToGo" files, the output of PhyloToL part 1. This means that using this dataset, you can jump right in to running analyses of any subset of these taxa using any of the OGs in the Hook Database. If you want to add your own samples or use a different set of OGs, you should check out [PhyloToL part 1](https://github.com/Katzlab/PhyloToL-6/wiki/PhyloToL-Part-1).
|
||||
|
||||
# Overlap and similarity filters
|
||||
# Running PhyloToL Part 2
|
||||
|
||||
# Guidance
|
||||
## Overlap and similarity filters
|
||||
|
||||
# Gene trees
|
||||
## Guidance
|
||||
|
||||
# Contamination loop
|
||||
## Gene trees
|
||||
|
||||
## Contamination loop
|
||||
|
||||
The contamination coop (CL) is implemented within PhyloToL to allow the removal of contaminants based on the topology of each tree (phylgoeny-informed contamination removal). Three modes are available: sister-, subsister-, and clade-based contamination removal. All modes take a user defined file of 'rules,' used to identify the sequences to remove.
|
||||
|
||||
@ -18,7 +22,8 @@ Sisters-based contamination removal identifies sequences as putative contaminant
|
||||
|
||||
Clade-based contamination removal operates differently. In this mode, the CL searches for monophyletic clades in each gene tree that match a set of given criteria. For example, if we want to 'clade-grab' for robust Opisthokont clades, we might choose to keep only Opisthokont sequences that fall in a monophyletic clade of 12 or more species of Opisthokont; all other Opisthokont sequences in the tree would be removed.
|
||||
|
||||
The CL runs iteratively, _meaning_...
|
||||
The CL runs iteratively and users must set the number of times that rules should be applied to reconstructed trees. Starting with a set of trees and a list of rules (i.e. a sequence from a ciliate to be removed if it falls sister to a known food source), PhyloToL will: identify a list of sequences as contaminants (writing them out to xxxx file), generate a fasta file for each gene family without contaminating sequences, reconstruct an alignment using ?Guidance with x iterations?, and generate a new tree. The default setting is to run the CL for 5 loops, and users can inspect outputs to determine optimal number for their study.
|
||||
|
||||
|
||||
## Contamination loop setup
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user