Updated PhyloToL Part 1 (markdown)

Katzlab 2024-08-09 17:07:45 -04:00
parent c62b171274
commit 8acbfc1dea

@ -1,6 +1,6 @@
## Overview and modularity
PhyloToL part 1 is primarily intended to assign gene families to assembled transcripts or genomic CDS, but also contains a number of quality filters and other curation steps. _More description here_
PhyloToL part 1 is primarily intended to assign gene families to assembled transcripts or genomic CDS, but also contains a number of quality filters and other curation steps. For transcriptomic data, quality filters include removing sequences <200 bp, identifying and sequestering putative ribosomal RNA sequences, and labeling sequences as either likely eukaryotic (_E) or prokaryotic (_P). Initial gene family assignments for both transcripts and genome CDS are done through Diamond analysis against either the PhyloToL Hook database (>15,000 gene families found across diverse eukaryotes), or a user-defined database of genes of interest. Renamed nucleotide and amino acid sequences are stored in 'ready to go' files, and a set of statistics are generated per sequence and per taxon. Optional analyses for transcriptomes include "cross plate contamination (XPC))", which seeks to remove contamination by index switching, and exploration of alternative genetic code (of particular importance for lineages like ciliates). Additional details are outline in Figure S2.
## Setup