Updated QuickStart (markdown)

MCLeleu 2025-01-24 11:30:37 +01:00
parent 3b8a7eb37d
commit ab5549c18b

@ -1,49 +1,75 @@
# Quickstart EukPhylo v1.0
## Installing EukPhylo
# Installing EukPhylo
Scripts can be used as downloaded from the [GitHub](https://github.com/Katzlab/EukPhylo), and should work on any platform
Dependencies & third party tools, along with the versions that we use at the Katz lab
TrimAl (1.2)
Guidance (2.2)
Diamond (0.9.30, compiled with GCC 8.3.0)
MAFFT (7.475)
IQ-Tree (2.1.12)
RAxML (8.2.12)
BLAST+ (2.9.0)
Vsearch (2.21.1, compiled with GCC 10.3.0)
Python libraries (can be installed with Pip)
ETE3 (pip install ete3)
BioPython
tqdm
* TrimAl (1.2)
* Guidance (2.2)
* Diamond (0.9.30, compiled with GCC 8.3.0)
* MAFFT (7.475)
* IQ-Tree (2.1.12)
* RAxML (8.2.12)
* BLAST+ (2.9.0)
* Vsearch (2.21.1, compiled with GCC 10.3.0)
* Python libraries (can be installed with Pip)
* ETE3 (pip install ete3)
* BioPython
* tqdm
## EukPhylo part 1 = Assigning Gene families
# EukPhylo part 1 = Assigning Gene families
EukPhylo part 1 runs CDS or assembled transcripts through several scripts in order (7 for transcriptomes, 5 for genomes). These scripts are run through a wrapper script.
### Transcriptomes:
Set Up:
## Transcriptomes:
### Set Up:
* A folder called “AssembledTranscripts” with your assembled transcript fasta files
* A folder called “Databases” with the three sub folders:
** db_BvsE (how we ID likely-bacterial sequences)
** db_StopFreq (for stop codon assignment)
** db_OG
*** Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy))
*** Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy))
* * db_BvsE (how we ID likely-bacterial sequences)
* * db_StopFreq (for stop codon assignment)
* * db_OG
* * * Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy))
* * * Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy))
* A folder called “Scripts” filled with scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/Scripts) on Github
Running:
### Running:
python wrapper.py -1 1 -2 7 --assembled_transcripts AssembledTranscripts -o . --genetic_code Universal -d Databases > log.txt
Here add detail of each option possible:
-1 = start script
-2 = end script
--assembled_transcripts = Folder with Assembled transcripts in fasta format
-o = path to output folder
--genetic_code = specified genetic code, name of .txt file with Genetic codes
-d = path to Databases folder
> log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
* -1 = start script
* -2 = end script
* --assembled_transcripts = Folder with Assembled transcripts in fasta format
* -o = path to output folder
* --genetic_code = specified genetic code, name of .txt file with Genetic codes
* -d = path to Databases folder
* log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
Output:
### Output:
* ReadyToGo = AA, NTD
* Sequences summary
## Genomes:
### Set Up:
* A folder called “CDS” with your CDS fasta files
* A folder called “Databases” with the three folders:
* * db_BvsE (how we ID likely-bacterial sequences)
* * db_StopFreq (for stop codon assignment)
* * db_OG
* * * Hook *.dmnd file ([Current version Hook-6.6.dmnd])
* * * Hook *.fasta file ([Current version Hook-6.6.fasta])
* A folder called “Scripts” filled with the 10 scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Genomes/Scripts) on Github. To run locally, pull out all scripts into main folder
### Running:
python wrapper.py -1 1 -2 5 --cds CDS -o . --genetic_code Gcodes.txt -d Databases > log.txt
Here add detail of each options possible:
* -1 = start script
* -2 = end script
* --cds = Folder with CDS files in fasta format
* -o = path to output folder
* --genetic_code = specified genetic code, name of .txt file with Genetic codes
* -d = path to Databases folder
* log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
### Output:
ReadyToGo = AA, NTD
Sequences summary