Update README.md

This commit is contained in:
ElinorSterner 2023-04-07 16:04:43 -04:00 committed by GitHub
parent 88155175f7
commit d0fb1cb352
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -3,10 +3,29 @@
> This folder contains many useful tools for analyzing sequence data.
## For taxonomy dir:
### Query_SRA_egs.py:
input list of taxonomic names and output all SRAs or GCA since 2020 (can be adjusted by modifying script). For SRAs, the script also gives sequsncing technology used (pacbio, miseq, etc) and experiment type. It excludes all SRAs that include the word 'amplicon'. Input is folder 'unique_taxon_lists' with files of keywords by major clade (separated by new lines). Put -t (transcriptome, SRA db) or -g (genome, assembly db) in the command line to specify data type.
### Query_SRA_egs.py:
**Purpose** Gives a spreadsheet of all SRA codes or GCA codes from Genbank associated with taxonomic terms in an input csv file
**Input** Input is folder 'unique_taxon_lists' with files of keywords by major clade (separated by new lines) by get_unique_taxa.py or manually
**Output** all SRAs or GCA since 2020 (can be adjusted by modifying script). For SRAs, the script also gives sequecing technology used (pacbio, miseq, etc) and experiment type. It excludes all SRAs that include the word 'amplicon'.
**Usage** -t (transcriptome, searches SRA db) or -g (genome, searches assembly db) in the command line to specify data type.
> Example command line: `python Query_SRA_egs.py -t OR -g`
### get_unique_taxa.py:
Written by Elinor 1/26, updated 2/12
**Purpose** make lists of unique taxonomy from phylotol master taxonomy column (genbank taxonomy for each taxa in the pipeline). These lists are the intended input for Query_SRA_egs.py. This cuts off the genus (and species if there is one), uniquifies the list and writes them out to files by the first word of the taxonomy
**Input** text file of taxonomies. make sure each taxonomic level is separated with '; ' (semicolon space) or the script will not parse the names right
WARNING: if you run the script multiple times, DELETE THE PREVIOUS OUTPUT. this is because it appends lines to the
end of files so you will have many duplicates
> Example command line: `python get_unique_taxa.py`
### Katz lab
>[About Katz Lab](https://www.science.smith.edu/katz-lab/)   \|