From 89f84d4f71ca59032a04b379dea5c8e9c6ea030e Mon Sep 17 00:00:00 2001 From: Katzlab Date: Sat, 10 Aug 2024 15:47:11 -0400 Subject: [PATCH] =?UTF-8?q?Updated=20PhyloToL=20Part=201=20=E2=80=93=20Gen?= =?UTF-8?q?e=20family=20assignment=20(markdown)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ...ene-family-assignment.md => PhyloToL-Part-1-–-GF-assignment.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename PhyloToL-Part-1-–-Gene-family-assignment.md => PhyloToL-Part-1-–-GF-assignment.md (84%) diff --git a/PhyloToL-Part-1-–-Gene-family-assignment.md b/PhyloToL-Part-1-–-GF-assignment.md similarity index 84% rename from PhyloToL-Part-1-–-Gene-family-assignment.md rename to PhyloToL-Part-1-–-GF-assignment.md index a77a6fb..2dc955c 100644 --- a/PhyloToL-Part-1-–-Gene-family-assignment.md +++ b/PhyloToL-Part-1-–-GF-assignment.md @@ -1,6 +1,6 @@ ## Overview and modularity -PhyloToL part 1 is primarily intended to assign gene families to assembled transcripts or genomic CDS, but also contains a number of quality filters and other curation steps. For transcriptomic data, quality filters include removing sequences <200 bp, identifying and sequestering putative ribosomal RNA sequences, and labeling sequences as either likely eukaryotic (_E) or prokaryotic (_P). Initial gene family assignments for both transcripts and genome CDS are done through Diamond analysis against either the PhyloToL Hook database (>15,000 gene families found across diverse eukaryotes), or a user-defined database of genes of interest. Renamed nucleotide and amino acid sequences are stored in 'ready to go' files, and a set of statistics are generated per sequence and per taxon. Optional analyses for transcriptomes include "cross plate contamination (XPC))", which seeks to remove contamination by index switching, and exploration of alternative genetic code (of particular importance for lineages like ciliates). Additional details are outline in Figure S2. +PhyloToL part 1 is primarily intended to assign gene families (GFs) to assembled transcripts or genomic CDS, but also contains a number of quality filters and other curation steps. For transcriptomic data, quality filters include removing sequences <200 bp, identifying and sequestering putative ribosomal RNA sequences, and labeling sequences as either likely eukaryotic (_E) or prokaryotic (_P). Initial gene family assignments for both transcripts and genome CDS are done through Diamond analysis against either the PhyloToL Hook database (>15,000 gene families found across diverse eukaryotes), or a user-defined database of genes of interest. Renamed nucleotide and amino acid sequences are stored in 'ready to go' files, and a set of statistics are generated per sequence and per taxon. Optional analyses for transcriptomes include "cross plate contamination (XPC))", which seeks to remove contamination by index switching, and exploration of alternative genetic code (of particular importance for lineages like ciliates). Additional details are outline in Figure S2. ## Setup