Updated EukPhylo QuickStart (markdown)

Godwin Ani 2025-03-03 11:37:15 -05:00
parent f0356015a8
commit 22ea61fed0

@ -1,5 +1,28 @@
> Note: The EukPhylo pipeline is currently being dockerised for easier installation and use. More information about the dockerfile can be found here - [Docker branch](https://github.com/Katzlab/EukPhylo/tree/Docker)
## Dockerfile
The docker file can be executed with:
```bash
cd EukPhylo
# Build the container
docker build -t Dockerfile . --tag MyEuk:1
# Get the container IMAGE_ID
docker image list
# Current command is:
docker run -it \
--mount type=bind,src=$(pwd)/databases,dst=/Databases \
--mount type=bind,src=$(pwd)/input_data,dst=/Input_data \
--mount type=bind,src=$(pwd)/output_data,dst=/Output_data \
{IMAGE_ID}
```
After development, GitHub CICD workflows can be added to automatically build and release the dockerfile for the end user.
# General Steps
EukPhylo pipeline is composed of two parts, that can be run individually: Part 1 can be run only once, to assign gene families; Part 2 builds MSAs, trees, and implements contamination removal and concatenation. It's preferable to run Part 2 using the outputs of Part 1 as input, but this is not required as long as the input files are in the same format (one fasta file per species, with sequences IDs starting with a 10 digit taxon identifier and ending in a gene family identifier with the format OGx_xxxxxx. See extended version of the wiki for details.)