diff --git a/EukPhylo-QuickStart.md b/EukPhylo-QuickStart.md index 38f84b4..65564db 100644 --- a/EukPhylo-QuickStart.md +++ b/EukPhylo-QuickStart.md @@ -1,5 +1,28 @@ > Note: The EukPhylo pipeline is currently being dockerised for easier installation and use. More information about the dockerfile can be found here - [Docker branch](https://github.com/Katzlab/EukPhylo/tree/Docker) +## Dockerfile + +The docker file can be executed with: + +```bash +cd EukPhylo + +# Build the container +docker build -t Dockerfile . --tag MyEuk:1 + +# Get the container IMAGE_ID +docker image list + +# Current command is: +docker run -it \ + --mount type=bind,src=$(pwd)/databases,dst=/Databases \ + --mount type=bind,src=$(pwd)/input_data,dst=/Input_data \ + --mount type=bind,src=$(pwd)/output_data,dst=/Output_data \ + {IMAGE_ID} +``` + +After development, GitHub CICD workflows can be added to automatically build and release the dockerfile for the end user. + # General Steps EukPhylo pipeline is composed of two parts, that can be run individually: Part 1 can be run only once, to assign gene families; Part 2 builds MSAs, trees, and implements contamination removal and concatenation. It's preferable to run Part 2 using the outputs of Part 1 as input, but this is not required as long as the input files are in the same format (one fasta file per species, with sequences IDs starting with a 10 digit taxon identifier and ending in a gene family identifier with the format OGx_xxxxxx. See extended version of the wiki for details.)