Updated QuickStart (markdown)

2025-12-29 04:20:24 +08:00 · 2025-01-24 11:30:37 +01:00 · 2025-01-24 11:30:37 +01:00 · ab5549c18b
commit ab5549c18b
parent 3b8a7eb37d
1 changed files with 58 additions and 32 deletions
--- a/QuickStart.md
+++ b/QuickStart.md
@ -1,49 +1,75 @@
-# Quickstart EukPhylo v1.0
-
-## Installing EukPhylo
+# Installing EukPhylo
 Scripts can be used as downloaded from the [GitHub](https://github.com/Katzlab/EukPhylo), and should work on any platform
 Dependencies & third party tools, along with the versions that we use at the Katz lab
-TrimAl (1.2)
-Guidance (2.2)
-Diamond (0.9.30, compiled with GCC 8.3.0)
-MAFFT (7.475)
-IQ-Tree (2.1.12)
-RAxML (8.2.12)
-BLAST+ (2.9.0)
-Vsearch (2.21.1, compiled with GCC 10.3.0)
-Python libraries (can be installed with Pip)
-ETE3 (pip install ete3)
-BioPython
-tqdm
+* TrimAl (1.2)
+* Guidance (2.2)
+* Diamond (0.9.30, compiled with GCC 8.3.0)
+* MAFFT (7.475)
+* IQ-Tree (2.1.12)
+* RAxML (8.2.12)
+* BLAST+ (2.9.0)
+* Vsearch (2.21.1, compiled with GCC 10.3.0)
+* Python libraries (can be installed with Pip)
+* ETE3 (pip install ete3)
+* BioPython
+* tqdm


-## EukPhylo part 1 = Assigning Gene families
+# EukPhylo part 1 = Assigning Gene families

 EukPhylo part 1 runs CDS or assembled transcripts through several scripts in order (7 for transcriptomes, 5 for genomes). These scripts are run through a ‘wrapper’ script.

-### Transcriptomes:
-Set Up:
+## Transcriptomes:
+### Set Up:
 * A folder called “AssembledTranscripts” with your assembled transcript fasta files
 * A folder called “Databases” with the three sub folders:
-** db_BvsE (how we ID likely-bacterial sequences)
-** db_StopFreq (for stop codon assignment)
-** db_OG
-*** Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy))
-*** Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy)) 
+* * db_BvsE (how we ID likely-bacterial sequences)
+* * db_StopFreq (for stop codon assignment)
+* * db_OG
+* * * Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy))
+* * * Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy)) 
 * A folder called “Scripts” filled with scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/Scripts) on Github

-Running:
+### Running:
 python wrapper.py -1 1 -2 7 --assembled_transcripts AssembledTranscripts -o . --genetic_code Universal -d Databases > log.txt

 Here add detail of each option possible:
-1 = start script
-2 = end script
--assembled_transcripts = Folder with Assembled transcripts in fasta format 
-o = path to output folder
--genetic_code = specified genetic code, name of .txt file with Genetic codes
-d = path to Databases folder 
-> log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
+* -1 = start script
+* -2 = end script
+* --assembled_transcripts = Folder with Assembled transcripts in fasta format 
+* -o = path to output folder
+* --genetic_code = specified genetic code, name of .txt file with Genetic codes
+* -d = path to Databases folder 
+* log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages

-Output:
+### Output:
+* ReadyToGo = AA, NTD
+* Sequences summary
+
+
+## Genomes:
+### Set Up:
+* A folder called “CDS” with your CDS fasta files
+* A folder called “Databases” with the three folders:
+* * db_BvsE (how we ID likely-bacterial sequences)
+* * db_StopFreq (for stop codon assignment)
+* * db_OG
+* * * Hook *.dmnd file ([Current version Hook-6.6.dmnd])
+* * * Hook *.fasta file ([Current version Hook-6.6.fasta]) 
+* A folder called “Scripts” filled with the 10 scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Genomes/Scripts) on Github. To run locally, pull out all scripts into main folder 
+
+### Running:
+python wrapper.py -1 1 -2 5 --cds CDS -o . --genetic_code Gcodes.txt -d Databases > log.txt
+
+Here add detail of each options possible:
+* -1 = start script
+* -2 = end script
+* --cds = Folder with CDS files in fasta format 
+* -o = path to output folder
+* --genetic_code = specified genetic code, name of .txt file with Genetic codes
+* -d = path to Databases folder
+* log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages 
+
+### Output:
 ReadyToGo = AA, NTD
 Sequences summary