Updated Utilities (markdown)

2026-02-11 12:30:25 +08:00 · 2024-08-09 17:53:17 -04:00 · 2024-08-09 17:53:17 -04:00 · e141f9e601
commit e141f9e601
parent fd88db1f48
1 changed files with 31 additions and 31 deletions
--- a/Utilities.md
+++ b/Utilities.md
@ -1,33 +1,33 @@
-PhyloToL 6 includes a set of stand-alone utility scripts that aim to increase the power of the analysis done with or without the core PhyloToL pipeline. We divide these scripts into five main categories: basic statistics, composition tools, MSA tools, gene tree description, and contamination removal.
+PhyloToL 6 includes a set of stand-alone utility scripts that aim to increase the power of the analysis done with or without the core PhyloToL pipeline. We divide these scripts into five main categories: basic statistics, composition tools, MSA tools, gene tree description, and contamination removal
 A summary of some of the scripts is divided by category here
-| [Script name](https://github.com/Katzlab/PhyloToL-6/tree/main/Utilities) | Intent                                                                             | Output                                                                     |
+| Input                         | [Script name](https://github.com/Katzlab/PhyloToL-6/tree/main/Utilities) | Intent                                                                             | Output                                                                     |
-| ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------- |
+| ----------------------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------- |
-| Assess_transcriptomes_v2.0.py                                            | Calculates the length, GC content, and coverage of assembled files                 | Spreadsheet containing the length, coverage, and GC of each transcript.    |
+| Assembly and fasta tools      | Assess_transcriptomes_v2.0.py                                            | Calculates the length, GC content, and coverage of assembled files                 | Spreadsheet containing the length, coverage, and GC of each transcript.    |
-| Cluster_v2.0.py                                                          | Clusters sequences in a fasta file                                                 | Clustered fasta files                                                      |
+|                               | Cluster_v2.0.py                                                          | Clusters sequences in a fasta file                                                 | Clustered fasta files                                                      |
-| GetTaxonomy_v1.0.py                                                      | Collects taxonomic classification of organisms from NCBI                           | Spreadsheet with NCBI taxonomy                                             |
+|                               | GetTaxonomy_v1.0.py                                                      | Collects taxonomic classification of organisms from NCBI                           | Spreadsheet with NCBI taxonomy                                             |
-| GetUniqueTaxa_v1.0.py                                                    | Gets the unique taxa from a taxonomic classification                               | Spreadsheet with unique taxa                                               |
+|                               | GetUniqueTaxa_v1.0.py                                                    | Gets the unique taxa from a taxonomic classification                               | Spreadsheet with unique taxa                                               |
-| Plot_transcriptomes_v2.0.py                                              | Plots the length, coverage, and GC distribution of transcriptomes.                 | Plots of transcripts distribution.                                         |
+|                               | Plot_transcriptomes_v2.0.py                                              | Plots the length, coverage, and GC distribution of transcriptomes.                 | Plots of transcripts distribution.                                         |
-| QuerySRA_v1.0.py                                                         | Downloads assemblies from NCBI                                                     | Assemblies, IDs, and GCA or SRR codes.                                     |
+|                               | QuerySRA_v1.0.py                                                         | Downloads assemblies from NCBI                                                     | Assemblies, IDs, and GCA or SRR codes.                                     |
-| ReadMapping_v2.0.py                                                      | Maps a group of trimmed reads to a reference                                       | Sam/Bam files.                                                             |
+|                               | ReadMapping_v2.0.py                                                      | Maps a group of trimmed reads to a reference                                       | Sam/Bam files.                                                             |
-| SeqLenToCsv_v1.0.py                                                      | Calculates the length of DNA sequences in fasta files                              | Spreadsheet containing the length of all sequences.                        |
+|                               | SeqLenToCsv_v1.0.py                                                      | Calculates the length of DNA sequences in fasta files                              | Spreadsheet containing the length of all sequences.                        |
-| SharedOGs_v1.0.py                                                        | Summarizes the gene family presence in fasta files                                 | Spreadsheet with the gene families                                         |
+|                               | SharedOGs_v1.0.py                                                        | Summarizes the gene family presence in fasta files                                 | Spreadsheet with the gene families                                         |
-|                                                                          |                                                                                    |                                                                            |
+|                               |                                                                          |                                                                                    |                                                                            |
-|                                                                          |                                                                                    |                                                                            |
+| Sequence composition analysis | CUB_v2.1.py                                                              | Summarizes the nucleotide composition of fasta files                               | Fasta file and several spreadsheets summarizing the nucelotide composition |
-| CUB_v2.1.py                                                              | Summarizes the nucleotide composition of fasta files                               | Fasta file and several spreadsheets summarizing the nucelotide composition |
+|                               | GC_identifier_v1.0.py                                                    | Renames sequence ID by GC composition                                              | Fasta files with relabeled sequence ID                                     |
-| GC_identifier_v1.0.py                                                    | Renames sequence ID by GC composition                                              | Fasta files with relabeled sequence ID                                     |
+|                               | PlotComps_v2.0.r                                                         | Produces GC3 width plots                                                           | GC3 width plots                                                            |
-| PlotComps_v2.0.r                                                         | Produces GC3 width plots                                                           | GC3 width plots                                                            |
+|                               | Plotcomps_SppName_v1.0.R                                                 | Produces GC3 width plots with the species name and # seqs added to each plot       | GC3 width plots                                                            |
-| Plotcomps_SppName_v1.0.R                                                 | Produces GC3 width plots with the species name and # seqs added to each plot       | GC3 width plots                                                            |
+|                               |                                                                          |                                                                                    |                                                                            |
-|                                                                          |                                                                                    |                                                                            |
+| MSA tools                     | BacktranslateAlignment.py                                                | Produces new nucleotide alignment from an amino acid alignment                     | Aligned nucelotide file                                                    |
-|                                                                          |                                                                                    |                                                                            |
+|                               | CountTaxonOccurence_v2.0.py                                              | Counts the occurences of each taxa in each gene family of a post guidance file     | Spreadsheet with counts of taxa                                            |
-| BacktranslateAlignment.py                                                | Produces new nucleotide alignment from an amino acid alignment                     | Aligned nucelotide file                                                    |
+|                               | friendlessness_v2.0.py                                                   | Describes the internal regions of insertion unique or nearly unique to a sequence  | Spreadsheet with each sequence statistics                                  |
-| CountTaxonOccurence_v2.0.py                                              | Counts the occurences of each taxa in each gene family of a post guidance file     | Spreadsheet with counts of taxa                                            |
+|                               | Gappiness_v2.0.py                                                        | Produces statistics on the terminal and internal gaps of an alignment              | Spreadsheet with the paralogs statistics                                   |
-| friendlessness_v2.0.py                                                   | Describes the internal regions of insertion unique or nearly unique to a sequence  | Spreadsheet with each sequence statistics                                  |
+|                               | GuidanceWrapper_v2.1.py                                                  | Guidance wrapper that can be used in place of PhyloToL pipeline                    | Guidanced alignment files                                                  |
-| Gappiness_v2.0.py                                                        | Produces statistics on the terminal and internal gaps of an alignment              | Spreadsheet with the paralogs statistics                                   |
+|                               |                                                                          |                                                                                    |                                                                            |
-| GuidanceWrapper_v2.1.py                                                  | Guidance wrapper that can be used in place of PhyloToL pipeline                    | Guidanced alignment files                                                  |
+| Gene tree description         | CladeSizes_v2.0.py                                                       | Describes clade sizes for different taxonomic groups                               | Spreadsheet describing clade sizes                                         |
-|                                                                          |                                                                                    |                                                                            |
+|                               | ColorByClade_v2.1.py                                                     | Visualizes placement of taxa by taxonomic group in trees                           | Colored trees                                                              |
-|                                                                          |                                                                                    |                                                                            |
+|                               | ContaminationBySisters_v2.2.py                                           | Summarizes the taxonomic distribution of sister sequences for each taxon in a tree | Two spreadsheets summarizing tree tips relationship                        |
-| CladeSizes_v2.0.py                                                       | Describes clade sizes for different taxonomic groups                               | Spreadsheet describing clade sizes                                         |
+|                               | RenameTips_v1.0.py                                                       | Renames the tip labels of trees to include metadata such as location and date      | Renamed trees                                                              |
-| ColorByClade_v2.1.py                                                     | Visualizes placement of taxa by taxonomic group in trees                           | Colored trees                                                              |
+|                               |                                                                          |                                                                                    |                                                                            |
-| ContaminationBySisters_v2.2.py                                           | Summarizes the taxonomic distribution of sister sequences for each taxon in a tree | Two spreadsheets summarizing tree tips relationship                        |
+| Stand-alone clade grabbing    | CladeGrabbing_v2.1.py                                                    | Selects clades of interest from trees using taxonomic specifications               | Phylogenetic trees                                                         |
 | RenameTips_v1.0.py                                                       | Renames the tip labels of trees to include metadata such as location and date      | Renamed trees                                                              |