mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-27 13:00:26 +08:00
Update README.md
This commit is contained in:
parent
d0fb1cb352
commit
9d42351d52
@ -2,7 +2,7 @@
|
||||
# Utilities
|
||||
> This folder contains many useful tools for analyzing sequence data.
|
||||
|
||||
## For taxonomy dir:
|
||||
## For Taxonomies dir:
|
||||
### Query_SRA_egs.py:
|
||||
|
||||
**Purpose** Gives a spreadsheet of all SRA codes or GCA codes from Genbank associated with taxonomic terms in an input csv file
|
||||
@ -19,13 +19,57 @@ Written by Elinor 1/26, updated 2/12
|
||||
|
||||
**Purpose** make lists of unique taxonomy from phylotol master taxonomy column (genbank taxonomy for each taxa in the pipeline). These lists are the intended input for Query_SRA_egs.py. This cuts off the genus (and species if there is one), uniquifies the list and writes them out to files by the first word of the taxonomy
|
||||
|
||||
**Input** text file of taxonomies. make sure each taxonomic level is separated with '; ' (semicolon space) or the script will not parse the names right
|
||||
**Input** text file of taxonomies called `all_taxa.txt`. make sure each taxonomic level is separated with `; ` (semicolon space) or the script will not parse the names right
|
||||
|
||||
**Output** txt file of _all_ unique names found, and a directory of txt files of unique names sorted by major clade (the first word in the line of input taxonomy
|
||||
|
||||
**Usage**
|
||||
>`python get_unique_taxa.py`
|
||||
|
||||
WARNING: if you run the script multiple times, DELETE THE PREVIOUS OUTPUT. this is because it appends lines to the
|
||||
end of files so you will have many duplicates
|
||||
|
||||
> Example command line: `python get_unique_taxa.py`
|
||||
### get_taxonomy.py:
|
||||
|
||||
**Purpose**
|
||||
|
||||
Queries Entrez Search with the genus and species name associated with 10 digit codes and returns the taxonomy for each name if available.
|
||||
|
||||
**Input**
|
||||
|
||||
Spreadsheet with ten digit codes in the first column and the genus and species names in the second column (csv).
|
||||
|
||||
**Output**
|
||||
|
||||
CSV file called `output_taxonomies.csv` with 10 digit codes and genbank taxonomy.
|
||||
|
||||
**Usage**
|
||||
Input a spreadsheet with ten digit codes in the first column and the genus and species names in the second column. Preferably, the genus and species name will be separated by a space and there will be no extraneous characters in the second column.
|
||||
|
||||
>`python get_taxonomy.py --input_file <path to .csv file>`
|
||||
|
||||
|
||||
## For Assemblies dir:
|
||||
### assess_transcriptomes.py:
|
||||
Written March 2023 by Elinor (esterner27@gmail.com) to plot length, coverage and GC of assembled transcripts
|
||||
|
||||
**Purpose** Rename rnaSpades output to new names in the txt file, then iterate through them all and gather GC, length and coverage. With that data, it plots R scripts
|
||||
|
||||
**Input**
|
||||
Directory of directories output by rnaSpades OR folder called Renamed_assembled_files of previously renamed files (if this is the case, put `-r` or --renamed in the command line)
|
||||
txt file of LKH number and new names formatted like this: LKHxxx\tLKHxxx-10_digit_code-descriptor_of_taxon
|
||||
R script plot_assemblies.R, which is called from within this python script
|
||||
|
||||
**Usage**
|
||||
|
||||
To run if your rnaSpades output is **not** renamed yet:
|
||||
>`python assess_transcriptomes.py --raw <pathway to directory of spades output>`
|
||||
|
||||
To run if your files are already renamed:
|
||||
>`python assess_transcriptomes.py --renamed <pathway to directory of renamed assemblies>`
|
||||
|
||||
**Output** csv file of length, GC, coverage of each transcript, and multiple R plots, faceted by taxon and a csv file of data. It plots GC by length, and distributions of coverage, length and GC content across the whole transcript
|
||||
|
||||
|
||||
### Katz lab
|
||||
>[About Katz Lab](https://www.science.smith.edu/katz-lab/) \|
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user