Created QuickStart (markdown)

2025-12-29 06:10:30 +08:00 · 2025-01-24 11:25:10 +01:00 · 2025-01-24 11:25:10 +01:00 · 3b8a7eb37d
commit 3b8a7eb37d
parent c16912249b
1 changed files with 49 additions and 0 deletions
--- a/QuickStart.md
+++ b/QuickStart.md
@ -0,0 +1,49 @@
+# Quickstart EukPhylo v1.0
+
+## Installing EukPhylo
+Scripts can be used as downloaded from the [GitHub](https://github.com/Katzlab/EukPhylo), and should work on any platform
+Dependencies & third party tools, along with the versions that we use at the Katz lab
+TrimAl (1.2)
+Guidance (2.2)
+Diamond (0.9.30, compiled with GCC 8.3.0)
+MAFFT (7.475)
+IQ-Tree (2.1.12)
+RAxML (8.2.12)
+BLAST+ (2.9.0)
+Vsearch (2.21.1, compiled with GCC 10.3.0)
+Python libraries (can be installed with Pip)
+ETE3 (pip install ete3)
+BioPython
+tqdm
+
+
+## EukPhylo part 1 = Assigning Gene families
+
+EukPhylo part 1 runs CDS or assembled transcripts through several scripts in order (7 for transcriptomes, 5 for genomes). These scripts are run through a ‘wrapper’ script.
+
+### Transcriptomes:
+Set Up:
+* A folder called “AssembledTranscripts” with your assembled transcript fasta files
+* A folder called “Databases” with the three sub folders:
+** db_BvsE (how we ID likely-bacterial sequences)
+** db_StopFreq (for stop codon assignment)
+** db_OG
+*** Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy))
+*** Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy)) 
+* A folder called “Scripts” filled with scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/Scripts) on Github
+
+Running:
+python wrapper.py -1 1 -2 7 --assembled_transcripts AssembledTranscripts -o . --genetic_code Universal -d Databases > log.txt
+
+Here add detail of each option possible:
+-1 = start script
+-2 = end script
+--assembled_transcripts = Folder with Assembled transcripts in fasta format 
+-o = path to output folder
+--genetic_code = specified genetic code, name of .txt file with Genetic codes
+-d = path to Databases folder 
+> log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
+
+Output:
+ReadyToGo = AA, NTD
+Sequences summary