From 527e4e2b8b4e1d3dfd82d48e1e455e9ebde8e991 Mon Sep 17 00:00:00 2001 From: "Adri K. Grow" <42044618+adriannagrow@users.noreply.github.com> Date: Sun, 9 Feb 2025 01:41:45 -0500 Subject: [PATCH] Updated EukPhylo QuickStart (markdown) --- EukPhylo-QuickStart.md | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/EukPhylo-QuickStart.md b/EukPhylo-QuickStart.md index 0646f33..2675334 100644 --- a/EukPhylo-QuickStart.md +++ b/EukPhylo-QuickStart.md @@ -141,7 +141,7 @@ Below are several optional ways to parameterize EukPhylo Part 2 **General:** |Parameter|Options|Description|Default| |:---|:---|:---|:---| -|`--force`||Overwrites all existing files in the `Output` folder| +|`--force`||Overwrites all existing files in the `Output` folder|NA| |`--tree_method`|`iqtree`, `iqtree_fast`, `raxml`, `fasttree`|Change tree building software|`iqtree`| **For BLAST and GUIDANCE:** @@ -156,26 +156,26 @@ Below are several optional ways to parameterize EukPhylo Part 2 |`--guidance_threads`|int|Number of threads to allocate to Guidance|`20`| **For reducing number of similar sequences:** -|Parameter|Required|Options|Help| +|Parameter|Required|Description|Default| |:---|:---|:---|:---| -|`--similarity_filter`|yes|action = store_true|Run the similarity filter in pre-Guidance| -|`--sim_cutoff`|yes|default = 1, type = float|Sequences from the same taxa that are assigned to the same OG are removed if they are more similar than this cutoff| -|`--sim_taxa`|no|default = None|A file listing taxa (10-digit codes) to apply the similarity filter on (e.g. sim_taxa.txt)| +|`--similarity_filter`|yes|Run the similarity filter in pre-Guidance|NA| +|`--sim_cutoff`|yes|float|Sequences from the same taxa that are assigned to the same OG are removed if they are more similar than this cutoff|`1`| +|`--sim_taxa`|no|A file listing taxa (10-digit codes) to apply the similarity filter on (e.g. sim_taxa.txt)|NA| **For removing known poor-quality or contaminant sequences (user informed):** |Parameter|Description| |:---|:---| -|`--blacklist`|type = str; A file listing sequence IDs to remove from analysis (e.g. to_remove.txt)| +|`--blacklist`|str; A file listing sequence IDs to remove from analysis (e.g. to_remove.txt)| **For removing sequences based on GC composition:** *Note: you must first identify sequences with OGA, OGG, OG6 using the GC_identifier.py script [here](https://github.com/Katzlab/EukPhylo/tree/main/Utilities/for_fastas) on GitHub* -|Parameter|Options|Help| -|:---|:---|:---| -|`--og_identifier`|default = `OG`, choices = `OG`,`OG6`,`OGA`,`OGG`|Select sequences by GC width| +|Parameter|Options|Description|Default| +|:---|:---|:---|:---| +|`--og_identifier`|`OG`,`OG6`,`OGA`,`OGG`|Select sequences by GC width|`OG` ## Contamination Removal -Contamination removal within EukPhylo (also called Contamination Loop) allows for sequence removal based on Sisters/Subsisters identification or based on Clades diversity. An examplar run is available in [Figshare](https://figshare.com/articles/dataset/Examplar_runs_PhyloToL_and_CLoop/26662018) +Contamination removal within EukPhylo (also called Contamination Loop or CL) allows for sequence removal based on Sisters/Subsisters identification or based on Clades diversity. An examplar run is available in [Figshare](https://figshare.com/articles/dataset/Examplar_runs_PhyloToL_and_CLoop/26662018) ### Set up: * An input folder (called for example Input), with both @@ -187,11 +187,11 @@ Contamination removal within EukPhylo (also called Contamination Loop) allows fo * the Scripts Folder ### Running: -Basic running of the Contamination loop, with the sister mode: +Basic running of the Contamination Loop, with the sister mode: `python3 Scripts/eukphylo.py --start trees --end trees --data Input --output Output --contamination_loop seq --sister_rules sister_rules_file.txt > log.out` -Basic running of the Contamination loop, with the clade mode: +Basic running of the Contamination Loop, with the clade mode: `python3 Scripts/eukphylo.py --start trees --end trees --data Input --output Output --contamination_loop clade --clade_grabbing_rules_file clade_grabbing_rules.txt > log.out` @@ -199,15 +199,15 @@ Options: | Parameter | Required | Options | Description | Default | | ------------- | ------------- | ------------- | ------------- | ------------- | -| --contamination_loop | yes | seq, clade | The mode in which to run the CL | none | -| --nloops | no | any positive integer | Number of iterations | `5` | -| --sister_rules | only in sisters mode | Any valid path | Path to a text file containing sisters rules | none | -| --subsister_rules | only in subsisters mode | Any valid path | Path to a text file containing subsisters rules | none | -| --clade_grabbing_rules | only in clade mode | Any valid path | Path to a text file containing clade-grabbing rules | none | -| --clade_grabbing_exceptions | no | Any valid path | List of taxa to _not_ remove for any reason | none | -| --cl_tree_method | no | `iqtree`, `raxml`, `fasttree`, `iqtree_fast` | Tree-building method to use in each contamination loop iteration. | fasttree | -| --cl_alignment_method | no | `mafft_only`, `guidance` | Alignment method to use in each contamination loop iteration. | `mafft_only`| -| --cl_exclude_taxa | no | Any valid path | Path to a file containing taxon names present in input MSA/tree files but which should be removed in the first iteration of the contamination loop. | none | +|`--contamination_loop`|yes|seq, clade|The mode in which to run the CL|NA| +|`--nloops`|no|positive int|Number of iterations|`5`| +|`--sister_rules`|only in sisters mode|Any valid path|Path to a text file containing sisters rules|NA| +|`--subsister_rules`|only in subsisters mode|Any valid path|Path to a text file containing subsisters rules|NA| +|`--clade_grabbing_rules`|only in clade mode|Any valid path|Path to a text file containing clade-grabbing rules|NA| +|`--clade_grabbing_exceptions`|no|Any valid path|List of taxa to _not_ remove for any reason|NA| +|`--cl_tree_method`|no|`iqtree`, `raxml`, `fasttree`, `iqtree_fast`|Tree-building method to use in each contamination loop iteration|`fasttree`| +|`--cl_alignment_method`|no|`mafft_only`, `guidance`|Alignment method to use in each contamination loop iteration|`mafft_only`| +|`--cl_exclude_taxa`|no|Any valid path|Path to a file containing taxon names present in input MSA/tree files but which should be removed in the first iteration of the contamination loop|NA| ## Concatenation