mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-28 05:00:24 +08:00
Updated Utilities (markdown)
parent
fd88db1f48
commit
e141f9e601
62
Utilities.md
62
Utilities.md
@ -1,33 +1,33 @@
|
||||
PhyloToL 6 includes a set of stand-alone utility scripts that aim to increase the power of the analysis done with or without the core PhyloToL pipeline. We divide these scripts into five main categories: basic statistics, composition tools, MSA tools, gene tree description, and contamination removal.
|
||||
PhyloToL 6 includes a set of stand-alone utility scripts that aim to increase the power of the analysis done with or without the core PhyloToL pipeline. We divide these scripts into five main categories: basic statistics, composition tools, MSA tools, gene tree description, and contamination removal
|
||||
|
||||
A summary of some of the scripts is divided by category here
|
||||
|
||||
| [Script name](https://github.com/Katzlab/PhyloToL-6/tree/main/Utilities) | Intent | Output |
|
||||
| ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------- |
|
||||
| Assess_transcriptomes_v2.0.py | Calculates the length, GC content, and coverage of assembled files | Spreadsheet containing the length, coverage, and GC of each transcript. |
|
||||
| Cluster_v2.0.py | Clusters sequences in a fasta file | Clustered fasta files |
|
||||
| GetTaxonomy_v1.0.py | Collects taxonomic classification of organisms from NCBI | Spreadsheet with NCBI taxonomy |
|
||||
| GetUniqueTaxa_v1.0.py | Gets the unique taxa from a taxonomic classification | Spreadsheet with unique taxa |
|
||||
| Plot_transcriptomes_v2.0.py | Plots the length, coverage, and GC distribution of transcriptomes. | Plots of transcripts distribution. |
|
||||
| QuerySRA_v1.0.py | Downloads assemblies from NCBI | Assemblies, IDs, and GCA or SRR codes. |
|
||||
| ReadMapping_v2.0.py | Maps a group of trimmed reads to a reference | Sam/Bam files. |
|
||||
| SeqLenToCsv_v1.0.py | Calculates the length of DNA sequences in fasta files | Spreadsheet containing the length of all sequences. |
|
||||
| SharedOGs_v1.0.py | Summarizes the gene family presence in fasta files | Spreadsheet with the gene families |
|
||||
| | | |
|
||||
| | | |
|
||||
| CUB_v2.1.py | Summarizes the nucleotide composition of fasta files | Fasta file and several spreadsheets summarizing the nucelotide composition |
|
||||
| GC_identifier_v1.0.py | Renames sequence ID by GC composition | Fasta files with relabeled sequence ID |
|
||||
| PlotComps_v2.0.r | Produces GC3 width plots | GC3 width plots |
|
||||
| Plotcomps_SppName_v1.0.R | Produces GC3 width plots with the species name and # seqs added to each plot | GC3 width plots |
|
||||
| | | |
|
||||
| | | |
|
||||
| BacktranslateAlignment.py | Produces new nucleotide alignment from an amino acid alignment | Aligned nucelotide file |
|
||||
| CountTaxonOccurence_v2.0.py | Counts the occurences of each taxa in each gene family of a post guidance file | Spreadsheet with counts of taxa |
|
||||
| friendlessness_v2.0.py | Describes the internal regions of insertion unique or nearly unique to a sequence | Spreadsheet with each sequence statistics |
|
||||
| Gappiness_v2.0.py | Produces statistics on the terminal and internal gaps of an alignment | Spreadsheet with the paralogs statistics |
|
||||
| GuidanceWrapper_v2.1.py | Guidance wrapper that can be used in place of PhyloToL pipeline | Guidanced alignment files |
|
||||
| | | |
|
||||
| | | |
|
||||
| CladeSizes_v2.0.py | Describes clade sizes for different taxonomic groups | Spreadsheet describing clade sizes |
|
||||
| ColorByClade_v2.1.py | Visualizes placement of taxa by taxonomic group in trees | Colored trees |
|
||||
| ContaminationBySisters_v2.2.py | Summarizes the taxonomic distribution of sister sequences for each taxon in a tree | Two spreadsheets summarizing tree tips relationship |
|
||||
| RenameTips_v1.0.py | Renames the tip labels of trees to include metadata such as location and date | Renamed trees |
|
||||
| Input | [Script name](https://github.com/Katzlab/PhyloToL-6/tree/main/Utilities) | Intent | Output |
|
||||
| ----------------------------- | ------------------------------------------------------------------------ | ---------------------------------------------------------------------------------- | -------------------------------------------------------------------------- |
|
||||
| Assembly and fasta tools | Assess_transcriptomes_v2.0.py | Calculates the length, GC content, and coverage of assembled files | Spreadsheet containing the length, coverage, and GC of each transcript. |
|
||||
| | Cluster_v2.0.py | Clusters sequences in a fasta file | Clustered fasta files |
|
||||
| | GetTaxonomy_v1.0.py | Collects taxonomic classification of organisms from NCBI | Spreadsheet with NCBI taxonomy |
|
||||
| | GetUniqueTaxa_v1.0.py | Gets the unique taxa from a taxonomic classification | Spreadsheet with unique taxa |
|
||||
| | Plot_transcriptomes_v2.0.py | Plots the length, coverage, and GC distribution of transcriptomes. | Plots of transcripts distribution. |
|
||||
| | QuerySRA_v1.0.py | Downloads assemblies from NCBI | Assemblies, IDs, and GCA or SRR codes. |
|
||||
| | ReadMapping_v2.0.py | Maps a group of trimmed reads to a reference | Sam/Bam files. |
|
||||
| | SeqLenToCsv_v1.0.py | Calculates the length of DNA sequences in fasta files | Spreadsheet containing the length of all sequences. |
|
||||
| | SharedOGs_v1.0.py | Summarizes the gene family presence in fasta files | Spreadsheet with the gene families |
|
||||
| | | | |
|
||||
| Sequence composition analysis | CUB_v2.1.py | Summarizes the nucleotide composition of fasta files | Fasta file and several spreadsheets summarizing the nucelotide composition |
|
||||
| | GC_identifier_v1.0.py | Renames sequence ID by GC composition | Fasta files with relabeled sequence ID |
|
||||
| | PlotComps_v2.0.r | Produces GC3 width plots | GC3 width plots |
|
||||
| | Plotcomps_SppName_v1.0.R | Produces GC3 width plots with the species name and # seqs added to each plot | GC3 width plots |
|
||||
| | | | |
|
||||
| MSA tools | BacktranslateAlignment.py | Produces new nucleotide alignment from an amino acid alignment | Aligned nucelotide file |
|
||||
| | CountTaxonOccurence_v2.0.py | Counts the occurences of each taxa in each gene family of a post guidance file | Spreadsheet with counts of taxa |
|
||||
| | friendlessness_v2.0.py | Describes the internal regions of insertion unique or nearly unique to a sequence | Spreadsheet with each sequence statistics |
|
||||
| | Gappiness_v2.0.py | Produces statistics on the terminal and internal gaps of an alignment | Spreadsheet with the paralogs statistics |
|
||||
| | GuidanceWrapper_v2.1.py | Guidance wrapper that can be used in place of PhyloToL pipeline | Guidanced alignment files |
|
||||
| | | | |
|
||||
| Gene tree description | CladeSizes_v2.0.py | Describes clade sizes for different taxonomic groups | Spreadsheet describing clade sizes |
|
||||
| | ColorByClade_v2.1.py | Visualizes placement of taxa by taxonomic group in trees | Colored trees |
|
||||
| | ContaminationBySisters_v2.2.py | Summarizes the taxonomic distribution of sister sequences for each taxon in a tree | Two spreadsheets summarizing tree tips relationship |
|
||||
| | RenameTips_v1.0.py | Renames the tip labels of trees to include metadata such as location and date | Renamed trees |
|
||||
| | | | |
|
||||
| Stand-alone clade grabbing | CladeGrabbing_v2.1.py | Selects clades of interest from trees using taxonomic specifications | Phylogenetic trees |
|
||||
Loading…
x
Reference in New Issue
Block a user