mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-29 06:10:30 +08:00
Created QuickStart (markdown)
parent
c16912249b
commit
3b8a7eb37d
49
QuickStart.md
Normal file
49
QuickStart.md
Normal file
@ -0,0 +1,49 @@
|
||||
# Quickstart EukPhylo v1.0
|
||||
|
||||
## Installing EukPhylo
|
||||
Scripts can be used as downloaded from the [GitHub](https://github.com/Katzlab/EukPhylo), and should work on any platform
|
||||
Dependencies & third party tools, along with the versions that we use at the Katz lab
|
||||
TrimAl (1.2)
|
||||
Guidance (2.2)
|
||||
Diamond (0.9.30, compiled with GCC 8.3.0)
|
||||
MAFFT (7.475)
|
||||
IQ-Tree (2.1.12)
|
||||
RAxML (8.2.12)
|
||||
BLAST+ (2.9.0)
|
||||
Vsearch (2.21.1, compiled with GCC 10.3.0)
|
||||
Python libraries (can be installed with Pip)
|
||||
ETE3 (pip install ete3)
|
||||
BioPython
|
||||
tqdm
|
||||
|
||||
|
||||
## EukPhylo part 1 = Assigning Gene families
|
||||
|
||||
EukPhylo part 1 runs CDS or assembled transcripts through several scripts in order (7 for transcriptomes, 5 for genomes). These scripts are run through a ‘wrapper’ script.
|
||||
|
||||
### Transcriptomes:
|
||||
Set Up:
|
||||
* A folder called “AssembledTranscripts” with your assembled transcript fasta files
|
||||
* A folder called “Databases” with the three sub folders:
|
||||
** db_BvsE (how we ID likely-bacterial sequences)
|
||||
** db_StopFreq (for stop codon assignment)
|
||||
** db_OG
|
||||
*** Hook *.dmnd file ([Current version Hook-6.6.dmnd](https://drive.google.com/open?id=1ywYLZXzcTERDFCysz5vPbI9u6WRxz5r0&usp=drive_copy))
|
||||
*** Hook *.fasta file ([Current version Hook-6.6.fasta](https://drive.google.com/open?id=1AN4_SmZUYFH6_xh2qOhyNUlFZ_NT9_-D&usp=drive_copy))
|
||||
* A folder called “Scripts” filled with scripts from [here](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/Scripts) on Github
|
||||
|
||||
Running:
|
||||
python wrapper.py -1 1 -2 7 --assembled_transcripts AssembledTranscripts -o . --genetic_code Universal -d Databases > log.txt
|
||||
|
||||
Here add detail of each option possible:
|
||||
-1 = start script
|
||||
-2 = end script
|
||||
--assembled_transcripts = Folder with Assembled transcripts in fasta format
|
||||
-o = path to output folder
|
||||
--genetic_code = specified genetic code, name of .txt file with Genetic codes
|
||||
-d = path to Databases folder
|
||||
> log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages
|
||||
|
||||
Output:
|
||||
ReadyToGo = AA, NTD
|
||||
Sequences summary
|
||||
Loading…
x
Reference in New Issue
Block a user