Updated Utilities (markdown)

Auden Cote-L'Heureux 2025-01-17 10:13:42 -05:00
parent 944f2afc8e
commit 05bd996b15

@ -1,5 +1,5 @@
PhyloToL 6 includes a set of stand-alone utility scripts that aim to increase the power of the analysis done with or without the core PhyloToL pipeline. We divide these scripts into five main categories: assembly and fasta tools, sequence composition, MSA tools, gene tree descriptions and stand alone clade grabbing.
* assembly and fasta tools capture tasks including downloading sequences from GenBank, clustering sequences, calculating statistics on assemblies, and estimating most shared gene families (OGs) for use in PhyloToL part 2
EukPhylo includes a set of stand-alone utility scripts that aim to increase the power of the analysis done with or without the core EukPhylo pipeline. We divide these scripts into five main categories: assembly and fasta tools, sequence composition, MSA tools, gene tree descriptions and stand alone clade grabbing.
* assembly and fasta tools capture tasks including downloading sequences from GenBank, clustering sequences, calculating statistics on assemblies, and estimating most shared gene families (OGs) for use in EukPhylo part 2
* sequence composition analysis calculates statistics for coding domains (e.g. composition, effective number of codons), plots outputs, and enables users to rename sequences in "ready to gos" based on GC content at silent sites.
* MSA tools include assessment of gaps, a wrapper for Guidance analyses, and a tool to count taxa across gene families (useful for deciding on which trees to run after part 1
* Gene tree description utilities allows users to modify trees (i.e. to rename and color tips) and to assess clade sizes and levels of contamination
@ -8,7 +8,7 @@ PhyloToL 6 includes a set of stand-alone utility scripts that aim to increase th
All utilities are written in Python and contain headers that provide information on usage, and a summary of utilities is divided by category here
| Category | [Script name](https://github.com/Katzlab/PhyloToL-6/tree/main/Utilities) | Intent | Output |
| Category | [Script name](https://github.com/Katzlab/EukPhylo/tree/main/Utilities) | Intent | Output |
| ----------------------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------- |
| Assembly and fasta tools | Assess_transcriptomes.py | Calculates the length, GC content, and coverage of assembled files | Spreadsheet containing the length, coverage, and GC of each transcript. |
| | Cluster.py | Clusters sequences in a fasta file | Clustered fasta files |
@ -29,7 +29,7 @@ All utilities are written in Python and contain headers that provide information
| | CountTaxonOccurence.py | Counts the occurences of each taxa in each gene family of a post guidance file | Spreadsheet with counts of taxa |
| | friendlessness.py | Describes the internal regions of insertion unique or nearly unique to a sequence | Spreadsheet with each sequence statistics |
| | Gappiness.py | Produces statistics on the terminal and internal gaps of an alignment | Spreadsheet with the paralogs statistics |
| | GuidanceWrapper.py | Guidance wrapper that can be used in place of PhyloToL pipeline | Guidanced alignment files |
| | GuidanceWrapper.py | Guidance wrapper that can be used in place of EukPhylo pipeline | Guidanced alignment files |
| | | | |
| Gene tree description | CladeSizes.py | Describes clade sizes for different taxonomic groups | Spreadsheet describing clade sizes |
| | ColorByClade.py | Visualizes placement of taxa by taxonomic group in trees | Colored trees |