Updated QuickStart (markdown)

MCLeleu 2025-01-24 11:30:37 +01:00
parent 3b8a7eb37d
commit ab5549c18b

@ -1,49 +1,75 @@
# Quickstart EukPhylo v1.0 # Installing EukPhylo
## Installing EukPhylo
Scripts can be used as downloaded from the [GitHub](https://github.com/Katzlab/EukPhylo), and should work on any platform Scripts can be used as downloaded from the [GitHub](https://github.com/Katzlab/EukPhylo), and should work on any platform
Dependencies & third party tools, along with the versions that we use at the Katz lab Dependencies & third party tools, along with the versions that we use at the Katz lab
TrimAl (1.2) * TrimAl (1.2)
Guidance (2.2) * Guidance (2.2)
Diamond (0.9.30, compiled with GCC 8.3.0) * Diamond (0.9.30, compiled with GCC 8.3.0)
MAFFT (7.475) * MAFFT (7.475)
IQ-Tree (2.1.12) * IQ-Tree (2.1.12)
RAxML (8.2.12) * RAxML (8.2.12)
BLAST+ (2.9.0) * BLAST+ (2.9.0)
Vsearch (2.21.1, compiled with GCC 10.3.0) * Vsearch (2.21.1, compiled with GCC 10.3.0)
Python libraries (can be installed with Pip) * Python libraries (can be installed with Pip)
ETE3 (pip install ete3) * ETE3 (pip install ete3)
BioPython * BioPython
tqdm * tqdm
## EukPhylo part 1 = Assigning Gene families # EukPhylo part 1 = Assigning Gene families
EukPhylo part 1 runs CDS or assembled transcripts through several scripts in order (7 for transcriptomes, 5 for genomes). These scripts are run through a wrapper script. EukPhylo part 1 runs CDS or assembled transcripts through several scripts in order (7 for transcriptomes, 5 for genomes). These scripts are run through a wrapper script.
### Transcriptomes: ## Transcriptomes:
Set Up: ### Set Up:
* A folder called “AssembledTranscripts” with your assembled transcript fasta files * A folder called “AssembledTranscripts” with your assembled transcript fasta files
* A folder called “Databases” with the three sub folders: * A folder called “Databases” with the three sub folders:
** db_BvsE (how we ID likely-bacterial sequences) * * db_BvsE (how we ID likely-bacterial sequences)
** db_StopFreq (for stop codon assignment) * * db_StopFreq (for stop codon assignment)
** db_OG * * db_OG
*** Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy)) * * * Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy))
*** Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy)) * * * Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy))
* A folder called “Scripts” filled with scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/Scripts) on Github * A folder called “Scripts” filled with scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/Scripts) on Github
Running: ### Running:
python wrapper.py -1 1 -2 7 --assembled_transcripts AssembledTranscripts -o . --genetic_code Universal -d Databases > log.txt python wrapper.py -1 1 -2 7 --assembled_transcripts AssembledTranscripts -o . --genetic_code Universal -d Databases > log.txt
Here add detail of each option possible: Here add detail of each option possible:
-1 = start script * -1 = start script
-2 = end script * -2 = end script
--assembled_transcripts = Folder with Assembled transcripts in fasta format * --assembled_transcripts = Folder with Assembled transcripts in fasta format
-o = path to output folder * -o = path to output folder
--genetic_code = specified genetic code, name of .txt file with Genetic codes * --genetic_code = specified genetic code, name of .txt file with Genetic codes
-d = path to Databases folder * -d = path to Databases folder
> log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages * log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
Output: ### Output:
* ReadyToGo = AA, NTD
* Sequences summary
## Genomes:
### Set Up:
* A folder called “CDS” with your CDS fasta files
* A folder called “Databases” with the three folders:
* * db_BvsE (how we ID likely-bacterial sequences)
* * db_StopFreq (for stop codon assignment)
* * db_OG
* * * Hook *.dmnd file ([Current version Hook-6.6.dmnd])
* * * Hook *.fasta file ([Current version Hook-6.6.fasta])
* A folder called “Scripts” filled with the 10 scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Genomes/Scripts) on Github. To run locally, pull out all scripts into main folder
### Running:
python wrapper.py -1 1 -2 5 --cds CDS -o . --genetic_code Gcodes.txt -d Databases > log.txt
Here add detail of each options possible:
* -1 = start script
* -2 = end script
* --cds = Folder with CDS files in fasta format
* -o = path to output folder
* --genetic_code = specified genetic code, name of .txt file with Genetic codes
* -d = path to Databases folder
* log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
### Output:
ReadyToGo = AA, NTD ReadyToGo = AA, NTD
Sequences summary Sequences summary