Updated EukPhylo Part 2: MSAs, trees, and contamination loop (markdown)

Auden Cote-L'Heureux 2025-01-22 13:12:25 -05:00
parent 9114c8b6ad
commit c16912249b

@ -57,7 +57,7 @@ The `--start` and `--end` parameters tell EukPhylo what to expect in terms of in
* If you want to run through the Guidance step, set `--end` to 'aligned'
* If you want to start at a different point other than raw data, you will change the default `--start` parameter to 'unaligned' (input a fasta file of unaligned amino acid sequences for each GF), 'aligned' (input a fasta file of aligned amino acid sequences for each GF) , or 'trees' (input a newick string file for each GF).
The `--data` parameter is where you point EukPhylo to your input file. If starting from ReadyToGo files (`--start raw`), this should be the path to a folder containing amino acid ReadyToGo files as output by part 1. If starting with Guidance, this should be a path to a folder of unaligned amino acid files (`--start unaligned`), etc.
The `--data` parameter is where you point EukPhylo to your input file. If starting from ReadyToGo files (`--start raw`), this should be the path to a folder containing amino acid ReadyToGo files as output by part 1. If starting with Guidance, this should be a path to a folder of unaligned amino acid files (`--start unaligned`), etc. Note that when you are resuming a run and want to use files in the "Output" folder from a previous run, you will have to rename this "Output" folder to something else and give the new path as your --data parameter. For example, if you run only the pre-Guidance step, EukPhylo will generate a folder called `Output/Pre-Guidance`. If you then want to then later resume the run, you will have to rename this, e.g. to `OldOutput/Pre-Guidance`, and then in your new command use `--start unaligned --data OldOutput/PreGuidance`.
You will also need to give EukPhylo part 2 a list of all of the sample identifiers (taxon_list.txt) and gene family identifiers (listofOGs.txt) you want to include in your analysis; these text files should have no header and should just contain the list of identifiers, with one identifier per row.