mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-29 06:10:30 +08:00
Updated QuickStart (markdown)
parent
3b8a7eb37d
commit
ab5549c18b
@ -1,28 +1,26 @@
|
||||
# Quickstart EukPhylo v1.0
|
||||
|
||||
## Installing EukPhylo
|
||||
# Installing EukPhylo
|
||||
Scripts can be used as downloaded from the [GitHub](https://github.com/Katzlab/EukPhylo), and should work on any platform
|
||||
Dependencies & third party tools, along with the versions that we use at the Katz lab
|
||||
TrimAl (1.2)
|
||||
Guidance (2.2)
|
||||
Diamond (0.9.30, compiled with GCC 8.3.0)
|
||||
MAFFT (7.475)
|
||||
IQ-Tree (2.1.12)
|
||||
RAxML (8.2.12)
|
||||
BLAST+ (2.9.0)
|
||||
Vsearch (2.21.1, compiled with GCC 10.3.0)
|
||||
Python libraries (can be installed with Pip)
|
||||
ETE3 (pip install ete3)
|
||||
BioPython
|
||||
tqdm
|
||||
* TrimAl (1.2)
|
||||
* Guidance (2.2)
|
||||
* Diamond (0.9.30, compiled with GCC 8.3.0)
|
||||
* MAFFT (7.475)
|
||||
* IQ-Tree (2.1.12)
|
||||
* RAxML (8.2.12)
|
||||
* BLAST+ (2.9.0)
|
||||
* Vsearch (2.21.1, compiled with GCC 10.3.0)
|
||||
* Python libraries (can be installed with Pip)
|
||||
* ETE3 (pip install ete3)
|
||||
* BioPython
|
||||
* tqdm
|
||||
|
||||
|
||||
## EukPhylo part 1 = Assigning Gene families
|
||||
# EukPhylo part 1 = Assigning Gene families
|
||||
|
||||
EukPhylo part 1 runs CDS or assembled transcripts through several scripts in order (7 for transcriptomes, 5 for genomes). These scripts are run through a ‘wrapper’ script.
|
||||
|
||||
### Transcriptomes:
|
||||
Set Up:
|
||||
## Transcriptomes:
|
||||
### Set Up:
|
||||
* A folder called “AssembledTranscripts” with your assembled transcript fasta files
|
||||
* A folder called “Databases” with the three sub folders:
|
||||
* * db_BvsE (how we ID likely-bacterial sequences)
|
||||
@ -32,18 +30,46 @@ Set Up:
|
||||
* * * Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy))
|
||||
* A folder called “Scripts” filled with scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/Scripts) on Github
|
||||
|
||||
Running:
|
||||
### Running:
|
||||
python wrapper.py -1 1 -2 7 --assembled_transcripts AssembledTranscripts -o . --genetic_code Universal -d Databases > log.txt
|
||||
|
||||
Here add detail of each option possible:
|
||||
-1 = start script
|
||||
-2 = end script
|
||||
--assembled_transcripts = Folder with Assembled transcripts in fasta format
|
||||
-o = path to output folder
|
||||
--genetic_code = specified genetic code, name of .txt file with Genetic codes
|
||||
-d = path to Databases folder
|
||||
> log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
|
||||
* -1 = start script
|
||||
* -2 = end script
|
||||
* --assembled_transcripts = Folder with Assembled transcripts in fasta format
|
||||
* -o = path to output folder
|
||||
* --genetic_code = specified genetic code, name of .txt file with Genetic codes
|
||||
* -d = path to Databases folder
|
||||
* log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
|
||||
|
||||
Output:
|
||||
### Output:
|
||||
* ReadyToGo = AA, NTD
|
||||
* Sequences summary
|
||||
|
||||
|
||||
## Genomes:
|
||||
### Set Up:
|
||||
* A folder called “CDS” with your CDS fasta files
|
||||
* A folder called “Databases” with the three folders:
|
||||
* * db_BvsE (how we ID likely-bacterial sequences)
|
||||
* * db_StopFreq (for stop codon assignment)
|
||||
* * db_OG
|
||||
* * * Hook *.dmnd file ([Current version Hook-6.6.dmnd])
|
||||
* * * Hook *.fasta file ([Current version Hook-6.6.fasta])
|
||||
* A folder called “Scripts” filled with the 10 scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Genomes/Scripts) on Github. To run locally, pull out all scripts into main folder
|
||||
|
||||
### Running:
|
||||
python wrapper.py -1 1 -2 5 --cds CDS -o . --genetic_code Gcodes.txt -d Databases > log.txt
|
||||
|
||||
Here add detail of each options possible:
|
||||
* -1 = start script
|
||||
* -2 = end script
|
||||
* --cds = Folder with CDS files in fasta format
|
||||
* -o = path to output folder
|
||||
* --genetic_code = specified genetic code, name of .txt file with Genetic codes
|
||||
* -d = path to Databases folder
|
||||
* log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
|
||||
|
||||
### Output:
|
||||
ReadyToGo = AA, NTD
|
||||
Sequences summary
|
||||
Loading…
x
Reference in New Issue
Block a user