diff --git a/QuickStart.md b/QuickStart.md index d914732..6bc33d4 100644 --- a/QuickStart.md +++ b/QuickStart.md @@ -1,49 +1,75 @@ -# Quickstart EukPhylo v1.0 - -## Installing EukPhylo +# Installing EukPhylo Scripts can be used as downloaded from the [GitHub](https://github.com/Katzlab/EukPhylo), and should work on any platform Dependencies & third party tools, along with the versions that we use at the Katz lab -TrimAl (1.2) -Guidance (2.2) -Diamond (0.9.30, compiled with GCC 8.3.0) -MAFFT (7.475) -IQ-Tree (2.1.12) -RAxML (8.2.12) -BLAST+ (2.9.0) -Vsearch (2.21.1, compiled with GCC 10.3.0) -Python libraries (can be installed with Pip) -ETE3 (pip install ete3) -BioPython -tqdm +* TrimAl (1.2) +* Guidance (2.2) +* Diamond (0.9.30, compiled with GCC 8.3.0) +* MAFFT (7.475) +* IQ-Tree (2.1.12) +* RAxML (8.2.12) +* BLAST+ (2.9.0) +* Vsearch (2.21.1, compiled with GCC 10.3.0) +* Python libraries (can be installed with Pip) +* ETE3 (pip install ete3) +* BioPython +* tqdm -## EukPhylo part 1 = Assigning Gene families +# EukPhylo part 1 = Assigning Gene families EukPhylo part 1 runs CDS or assembled transcripts through several scripts in order (7 for transcriptomes, 5 for genomes). These scripts are run through a ‘wrapper’ script. -### Transcriptomes: -Set Up: +## Transcriptomes: +### Set Up: * A folder called “AssembledTranscripts” with your assembled transcript fasta files * A folder called “Databases” with the three sub folders: -** db_BvsE (how we ID likely-bacterial sequences) -** db_StopFreq (for stop codon assignment) -** db_OG -*** Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy)) -*** Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy)) +* * db_BvsE (how we ID likely-bacterial sequences) +* * db_StopFreq (for stop codon assignment) +* * db_OG +* * * Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy)) +* * * Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy)) * A folder called “Scripts” filled with scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/Scripts) on Github -Running: +### Running: python wrapper.py -1 1 -2 7 --assembled_transcripts AssembledTranscripts -o . --genetic_code Universal -d Databases > log.txt Here add detail of each option possible: --1 = start script --2 = end script ---assembled_transcripts = Folder with Assembled transcripts in fasta format --o = path to output folder ---genetic_code = specified genetic code, name of .txt file with Genetic codes --d = path to Databases folder -> log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages +* -1 = start script +* -2 = end script +* --assembled_transcripts = Folder with Assembled transcripts in fasta format +* -o = path to output folder +* --genetic_code = specified genetic code, name of .txt file with Genetic codes +* -d = path to Databases folder +* log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages -Output: +### Output: +* ReadyToGo = AA, NTD +* Sequences summary + + +## Genomes: +### Set Up: +* A folder called “CDS” with your CDS fasta files +* A folder called “Databases” with the three folders: +* * db_BvsE (how we ID likely-bacterial sequences) +* * db_StopFreq (for stop codon assignment) +* * db_OG +* * * Hook *.dmnd file ([Current version Hook-6.6.dmnd]) +* * * Hook *.fasta file ([Current version Hook-6.6.fasta]) +* A folder called “Scripts” filled with the 10 scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Genomes/Scripts) on Github. To run locally, pull out all scripts into main folder + +### Running: +python wrapper.py -1 1 -2 5 --cds CDS -o . --genetic_code Gcodes.txt -d Databases > log.txt + +Here add detail of each options possible: +* -1 = start script +* -2 = end script +* --cds = Folder with CDS files in fasta format +* -o = path to output folder +* --genetic_code = specified genetic code, name of .txt file with Genetic codes +* -d = path to Databases folder +* log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages + +### Output: ReadyToGo = AA, NTD Sequences summary \ No newline at end of file