From 3d1967c90f78400a399888a2d5a648d482ff126c Mon Sep 17 00:00:00 2001 From: Katzlab Date: Fri, 9 Aug 2024 17:51:47 -0400 Subject: [PATCH] Updated Utilities (markdown) --- Utilities.md | 60 +++++++++++++++++++++++++++------------------------- 1 file changed, 31 insertions(+), 29 deletions(-) diff --git a/Utilities.md b/Utilities.md index d1f4815..11fe868 100644 --- a/Utilities.md +++ b/Utilities.md @@ -1,31 +1,33 @@ PhyloToL 6 includes a set of stand-alone utility scripts that aim to increase the power of the analysis done with or without the core PhyloToL pipeline. We divide these scripts into five main categories: basic statistics, composition tools, MSA tools, gene tree description, and contamination removal. +A summary of some of the scripts is divided by category here -| Assembly and fasta tools | Assess_transcriptomes_v2.0.py | Calculates the length, GC content, and coverage of assembled files | Spreadsheet containing the length, coverage, and GC of each transcript. | -| ----------------------------- | ------------------------------ | ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | -| | Cluster_v2.0.py | Clusters sequences in a fasta file | Clustered fasta files | -| | GetTaxonomy_v1.0.py | Collects taxonomic classification of organisms from NCBI | Spreadsheet with NCBI taxonomy | -| | GetUniqueTaxa_v1.0.py | Gets the unique taxa from a taxonomic classification | Spreadsheet with unique taxa | -| | Plot_transcriptomes_v2.0.py | Plots the length, coverage, and GC distribution of transcriptomes. | Plots of transcripts distribution. | -| | QuerySRA_v1.0.py | Downloads assemblies from NCBI | Assemblies, IDs, and GCA or SRR codes. | -| | ReadMapping_v2.0.py | Maps a group of trimmed reads to a reference | Sam/Bam files. | -| | SeqLenToCsv_v1.0.py | Calculates the length of DNA sequences in fasta files | Spreadsheet containing the length of all sequences. | -| | SharedOGs_v1.0.py | Summarizes the gene family presence in fasta files | Spreadsheet with the gene families | -| | | | | -| | | | | -| Sequence composition analysis | CUB_v2.1.py | Summarizes the nucleotide composition of fasta files | Fasta file and several spreadsheets summarizing the nucelotide composition | -| | GC_identifier_v1.0.py | Renames sequence ID by GC composition | Fasta files with relabeled sequence ID | -| | PlotComps_v2.0.r | Produces GC3 width plots | GC3 width plots | -| | Plotcomps_SppName_v1.0.R | Produces GC3 width plots with the species name and # seqs added to each plot | GC3 width plots | -| | | | | -| | | | | -| MSA tools | BacktranslateAlignment.py | Produces new nucleotide alignment from an amino acid alignment | Aligned nucelotide file | -| | CountTaxonOccurence_v2.0.py | Counts the occurences of each taxa in each gene family of a post guidance file | Spreadsheet with counts of taxa | -| | friendlessness_v2.0.py | Describes the internal regions of insertion unique or nearly unique to a sequence | Spreadsheet with each sequence statistics | -| | Gappiness_v2.0.py | Produces statistics on the terminal and internal gaps of an alignment | Spreadsheet with the paralogs statistics | -| | GuidanceWrapper_v2.1.py | Guidance wrapper that can be used in place of PhyloToL pipeline | Guidanced alignment files | -| | | | | -| | | | | -| Gene tree description | CladeSizes_v2.0.py | Describes clade sizes for different taxonomic groups | Spreadsheet describing clade sizes | -| | ColorByClade_v2.1.py | Visualizes placement of taxa by taxonomic group in trees | Colored trees | -| | ContaminationBySisters_v2.2.py | Summarizes the taxonomic distribution of sister sequences for each taxon in a tree | Two spreadsheets summarizing tree tips relationship | -| | RenameTips_v1.0.py | Renames the tip labels of trees to include metadata such as location and date | Renamed trees | \ No newline at end of file +| [Script name](https://github.com/Katzlab/PhyloToL-6/tree/main/Utilities) | Intent | Output | +| ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | +| Assess_transcriptomes_v2.0.py | Calculates the length, GC content, and coverage of assembled files | Spreadsheet containing the length, coverage, and GC of each transcript. | +| Cluster_v2.0.py | Clusters sequences in a fasta file | Clustered fasta files | +| GetTaxonomy_v1.0.py | Collects taxonomic classification of organisms from NCBI | Spreadsheet with NCBI taxonomy | +| GetUniqueTaxa_v1.0.py | Gets the unique taxa from a taxonomic classification | Spreadsheet with unique taxa | +| Plot_transcriptomes_v2.0.py | Plots the length, coverage, and GC distribution of transcriptomes. | Plots of transcripts distribution. | +| QuerySRA_v1.0.py | Downloads assemblies from NCBI | Assemblies, IDs, and GCA or SRR codes. | +| ReadMapping_v2.0.py | Maps a group of trimmed reads to a reference | Sam/Bam files. | +| SeqLenToCsv_v1.0.py | Calculates the length of DNA sequences in fasta files | Spreadsheet containing the length of all sequences. | +| SharedOGs_v1.0.py | Summarizes the gene family presence in fasta files | Spreadsheet with the gene families | +| | | | +| | | | +| CUB_v2.1.py | Summarizes the nucleotide composition of fasta files | Fasta file and several spreadsheets summarizing the nucelotide composition | +| GC_identifier_v1.0.py | Renames sequence ID by GC composition | Fasta files with relabeled sequence ID | +| PlotComps_v2.0.r | Produces GC3 width plots | GC3 width plots | +| Plotcomps_SppName_v1.0.R | Produces GC3 width plots with the species name and # seqs added to each plot | GC3 width plots | +| | | | +| | | | +| BacktranslateAlignment.py | Produces new nucleotide alignment from an amino acid alignment | Aligned nucelotide file | +| CountTaxonOccurence_v2.0.py | Counts the occurences of each taxa in each gene family of a post guidance file | Spreadsheet with counts of taxa | +| friendlessness_v2.0.py | Describes the internal regions of insertion unique or nearly unique to a sequence | Spreadsheet with each sequence statistics | +| Gappiness_v2.0.py | Produces statistics on the terminal and internal gaps of an alignment | Spreadsheet with the paralogs statistics | +| GuidanceWrapper_v2.1.py | Guidance wrapper that can be used in place of PhyloToL pipeline | Guidanced alignment files | +| | | | +| | | | +| CladeSizes_v2.0.py | Describes clade sizes for different taxonomic groups | Spreadsheet describing clade sizes | +| ColorByClade_v2.1.py | Visualizes placement of taxa by taxonomic group in trees | Colored trees | +| ContaminationBySisters_v2.2.py | Summarizes the taxonomic distribution of sister sequences for each taxon in a tree | Two spreadsheets summarizing tree tips relationship | +| RenameTips_v1.0.py | Renames the tip labels of trees to include metadata such as location and date | Renamed trees | \ No newline at end of file