Updated EukPhylo QuickStart (markdown)

2026-02-10 17:00:24 +08:00 · 2025-02-09 01:44:51 -05:00 · 2025-02-09 01:44:51 -05:00 · 76b75b01b3
commit 76b75b01b3
parent 527e4e2b8b
1 changed files with 14 additions and 14 deletions
--- a/EukPhylo-QuickStart.md
+++ b/EukPhylo-QuickStart.md
@ -130,10 +130,10 @@ For additional input parameter options, see table below or run: `python phylotol
 |:---|:-------|:---|:---|
 |`--start`|`raw`, `unaligned`, `aligned`, `trees`|Stage at which to start running PhyloToL|`raw`|
 |`--end`|`unaligned`, `aligned`, `trees`|Stage until which to run PhyloToL. Options are `unaligned` (which will run up to but not including guidance), `aligned` (which will run up to but not including RAxML), and `trees` which will run through RAxML')|`trees`|
-|`--gf_list`|Any valid path|Path to the file with the GFs of interest. Only required if starting from the raw dataset|None|
-|`--taxon_list`|Any valid path|Path to the file with the taxa (10-digit codes) to include in the output|None| 
-|`--data`|Any valid path|Path to the input dataset. The format of this varies depending on your `--start` parameter. If you are running the contamination loop starting with trees, this folder must include both trees **AND** a fasta file for each tree (with identical file names other than the extension) that includes an amino-acid sequence for each tip of the tree (with the sequence names matching exactly the tip names)|None|
-|`--output`|Any valid path|Directory where the output folder should be created. If not given, the folder will be created in the parent directory of the folder containing the scripts|`../`|
+|`--gf_list`|Valid path|Path to the file with the GFs of interest. Only required if starting from the raw dataset|None|
+|`--taxon_list`|Valid path|Path to the file with the taxa (10-digit codes) to include in the output|None| 
+|`--data`|Valid path|Path to the input dataset. The format of this varies depending on your `--start` parameter. If you are running the contamination loop starting with trees, this folder must include both trees **AND** a fasta file for each tree (with identical file names other than the extension) that includes an amino-acid sequence for each tip of the tree (with the sequence names matching exactly the tip names)|None|
+|`--output`|Valid path|Directory where the output folder should be created. If not given, the folder will be created in the parent directory of the folder containing the scripts|`../`|

 ### Modularity
 Below are several optional ways to parameterize EukPhylo Part 2
@ -163,16 +163,16 @@ Below are several optional ways to parameterize EukPhylo Part 2
 |`--sim_taxa`|no|A file listing taxa (10-digit codes) to apply the similarity filter on (e.g. sim_taxa.txt)|NA|

 **For removing known poor-quality or contaminant sequences (user informed):**
-|Parameter|Description|
-|:---|:---|
-|`--blacklist`|str; A file listing sequence IDs to remove from analysis (e.g. to_remove.txt)|
+|Parameter|Options|Description|
+|:---|:---|:---|
+|`--blacklist`|str|A file listing sequence IDs to remove from analysis (e.g. to_remove.txt)|

 **For removing sequences based on GC composition:**

 *Note: you must first identify sequences with OGA, OGG, OG6 using the GC_identifier.py script [here](https://github.com/Katzlab/EukPhylo/tree/main/Utilities/for_fastas) on GitHub*
 |Parameter|Options|Description|Default|
 |:---|:---|:---|:---|
-|`--og_identifier`|`OG`,`OG6`,`OGA`,`OGG`|Select sequences by GC width|`OG`
+|`--og_identifier`|`OG`, `OG6`, `OGA`, `OGG`|Select sequences by GC width|`OG`

 ## Contamination Removal 
 Contamination removal within EukPhylo (also called Contamination Loop or CL) allows for sequence removal based on Sisters/Subsisters identification or based on Clades diversity. An examplar run is available in [Figshare](https://figshare.com/articles/dataset/Examplar_runs_PhyloToL_and_CLoop/26662018)
@ -199,15 +199,15 @@ Options:

 | Parameter  | Required | Options | Description | Default |
 | ------------- | ------------- | ------------- | ------------- | ------------- |
-|`--contamination_loop`|yes|seq, clade|The mode in which to run the CL|NA|
+|`--contamination_loop`|yes|`seq`, `clade`|The mode in which to run the CL|NA|
 |`--nloops`|no|positive int|Number of iterations|`5`|
-|`--sister_rules`|only in sisters mode|Any valid path|Path to a text file containing sisters rules|NA|
-|`--subsister_rules`|only in subsisters mode|Any valid path|Path to a text file containing subsisters rules|NA|
-|`--clade_grabbing_rules`|only in clade mode|Any valid path|Path to a text file containing clade-grabbing rules|NA|
-|`--clade_grabbing_exceptions`|no|Any valid path|List of taxa to _not_ remove for any reason|NA|
+|`--sister_rules`|only in sisters mode|Valid path|Path to a text file containing sisters rules|NA|
+|`--subsister_rules`|only in subsisters mode|Valid path|Path to a text file containing subsisters rules|NA|
+|`--clade_grabbing_rules`|only in clade mode|Valid path|Path to a text file containing clade-grabbing rules|NA|
+|`--clade_grabbing_exceptions`|no|Valid path|List of taxa to _not_ remove for any reason|NA|
 |`--cl_tree_method`|no|`iqtree`, `raxml`, `fasttree`, `iqtree_fast`|Tree-building method to use in each contamination loop iteration|`fasttree`|
 |`--cl_alignment_method`|no|`mafft_only`, `guidance`|Alignment method to use in each contamination loop iteration|`mafft_only`|
-|`--cl_exclude_taxa`|no|Any valid path|Path to a file containing taxon names present in input MSA/tree files but which should be removed in the first iteration of the contamination loop|NA|
+|`--cl_exclude_taxa`|no|Valid path|Path to a file containing taxon names present in input MSA/tree files but which should be removed in the first iteration of the contamination loop|NA|


 ## Concatenation