Updated QuickStart (markdown)

2025-12-29 06:20:25 +08:00 · 2025-01-24 11:30:37 +01:00 · 2025-01-24 11:30:37 +01:00 · ab5549c18b
commit ab5549c18b
parent 3b8a7eb37d
1 changed files with 58 additions and 32 deletions
--- a/QuickStart.md
+++ b/QuickStart.md
@ -1,49 +1,75 @@
-# Quickstart EukPhylo v1.0
+# Installing EukPhylo
 ## Installing EukPhylo
 Scripts can be used as downloaded from the [GitHub](https://github.com/Katzlab/EukPhylo), and should work on any platform
 Dependencies & third party tools, along with the versions that we use at the Katz lab
-TrimAl (1.2)
+* TrimAl (1.2)
-Guidance (2.2)
+* Guidance (2.2)
-Diamond (0.9.30, compiled with GCC 8.3.0)
+* Diamond (0.9.30, compiled with GCC 8.3.0)
-MAFFT (7.475)
+* MAFFT (7.475)
-IQ-Tree (2.1.12)
+* IQ-Tree (2.1.12)
-RAxML (8.2.12)
+* RAxML (8.2.12)
-BLAST+ (2.9.0)
+* BLAST+ (2.9.0)
-Vsearch (2.21.1, compiled with GCC 10.3.0)
+* Vsearch (2.21.1, compiled with GCC 10.3.0)
-Python libraries (can be installed with Pip)
+* Python libraries (can be installed with Pip)
-ETE3 (pip install ete3)
+* ETE3 (pip install ete3)
-BioPython
+* BioPython
-tqdm
+* tqdm
-## EukPhylo part 1 = Assigning Gene families
+# EukPhylo part 1 = Assigning Gene families
 EukPhylo part 1 runs CDS or assembled transcripts through several scripts in order (7 for transcriptomes, 5 for genomes). These scripts are run through a ‘wrapper’ script.
-### Transcriptomes:
+## Transcriptomes:
-Set Up:
+### Set Up:
 * A folder called “AssembledTranscripts” with your assembled transcript fasta files
 * A folder called “Databases” with the three sub folders:
-** db_BvsE (how we ID likely-bacterial sequences)
+* * db_BvsE (how we ID likely-bacterial sequences)
-** db_StopFreq (for stop codon assignment)
+* * db_StopFreq (for stop codon assignment)
-** db_OG
+* * db_OG
-*** Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy))
+* * * Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy))
-*** Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy)) 
+* * * Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy)) 
 * A folder called “Scripts” filled with scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/Scripts) on Github
-Running:
+### Running:
 python wrapper.py -1 1 -2 7 --assembled_transcripts AssembledTranscripts -o . --genetic_code Universal -d Databases > log.txt
 Here add detail of each option possible:
-1 = start script
+* -1 = start script
-2 = end script
+* -2 = end script
--assembled_transcripts = Folder with Assembled transcripts in fasta format 
+* --assembled_transcripts = Folder with Assembled transcripts in fasta format 
-o = path to output folder
+* -o = path to output folder
--genetic_code = specified genetic code, name of .txt file with Genetic codes
+* --genetic_code = specified genetic code, name of .txt file with Genetic codes
-d = path to Databases folder 
+* -d = path to Databases folder 
-> log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
+* log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
-Output:
+### Output:
 * ReadyToGo = AA, NTD
 * Sequences summary
 ## Genomes:
 ### Set Up:
 * A folder called “CDS” with your CDS fasta files
 * A folder called “Databases” with the three folders:
 * * db_BvsE (how we ID likely-bacterial sequences)
 * * db_StopFreq (for stop codon assignment)
 * * db_OG
 * * * Hook *.dmnd file ([Current version Hook-6.6.dmnd])
 * * * Hook *.fasta file ([Current version Hook-6.6.fasta]) 
 * A folder called “Scripts” filled with the 10 scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Genomes/Scripts) on Github. To run locally, pull out all scripts into main folder 
 ### Running:
 python wrapper.py -1 1 -2 5 --cds CDS -o . --genetic_code Gcodes.txt -d Databases > log.txt
 Here add detail of each options possible:
 * -1 = start script
 * -2 = end script
 * --cds = Folder with CDS files in fasta format 
 * -o = path to output folder
 * --genetic_code = specified genetic code, name of .txt file with Genetic codes
 * -d = path to Databases folder
 * log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages 
 ### Output:
 ReadyToGo = AA, NTD
 Sequences summary