From d0fb1cb352ba19dc4d7f47335f5aed5ad29bcec5 Mon Sep 17 00:00:00 2001 From: ElinorSterner <86856150+ElinorSterner@users.noreply.github.com> Date: Fri, 7 Apr 2023 16:04:43 -0400 Subject: [PATCH] Update README.md --- Utilities/README.md | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/Utilities/README.md b/Utilities/README.md index 367f1eb..14e7aba 100644 --- a/Utilities/README.md +++ b/Utilities/README.md @@ -3,10 +3,29 @@ > This folder contains many useful tools for analyzing sequence data. ## For taxonomy dir: -### Query_SRA_egs.py: -input list of taxonomic names and output all SRAs or GCA since 2020 (can be adjusted by modifying script). For SRAs, the script also gives sequsncing technology used (pacbio, miseq, etc) and experiment type. It excludes all SRAs that include the word 'amplicon'. Input is folder 'unique_taxon_lists' with files of keywords by major clade (separated by new lines). Put -t (transcriptome, SRA db) or -g (genome, assembly db) in the command line to specify data type. +### Query_SRA_egs.py: + +**Purpose** Gives a spreadsheet of all SRA codes or GCA codes from Genbank associated with taxonomic terms in an input csv file + +**Input** Input is folder 'unique_taxon_lists' with files of keywords by major clade (separated by new lines) by get_unique_taxa.py or manually + +**Output** all SRAs or GCA since 2020 (can be adjusted by modifying script). For SRAs, the script also gives sequecing technology used (pacbio, miseq, etc) and experiment type. It excludes all SRAs that include the word 'amplicon'. + +**Usage** -t (transcriptome, searches SRA db) or -g (genome, searches assembly db) in the command line to specify data type. > Example command line: `python Query_SRA_egs.py -t OR -g` +### get_unique_taxa.py: +Written by Elinor 1/26, updated 2/12 + +**Purpose** make lists of unique taxonomy from phylotol master taxonomy column (genbank taxonomy for each taxa in the pipeline). These lists are the intended input for Query_SRA_egs.py. This cuts off the genus (and species if there is one), uniquifies the list and writes them out to files by the first word of the taxonomy + +**Input** text file of taxonomies. make sure each taxonomic level is separated with '; ' (semicolon space) or the script will not parse the names right + + +WARNING: if you run the script multiple times, DELETE THE PREVIOUS OUTPUT. this is because it appends lines to the +end of files so you will have many duplicates + +> Example command line: `python get_unique_taxa.py` ### Katz lab >[About Katz Lab](https://www.science.smith.edu/katz-lab/)   \|