mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-27 07:30:24 +08:00
Updated EukPhylo QuickStart (markdown)
parent
527e4e2b8b
commit
76b75b01b3
@ -130,10 +130,10 @@ For additional input parameter options, see table below or run: `python phylotol
|
||||
|:---|:-------|:---|:---|
|
||||
|`--start`|`raw`, `unaligned`, `aligned`, `trees`|Stage at which to start running PhyloToL|`raw`|
|
||||
|`--end`|`unaligned`, `aligned`, `trees`|Stage until which to run PhyloToL. Options are `unaligned` (which will run up to but not including guidance), `aligned` (which will run up to but not including RAxML), and `trees` which will run through RAxML')|`trees`|
|
||||
|`--gf_list`|Any valid path|Path to the file with the GFs of interest. Only required if starting from the raw dataset|None|
|
||||
|`--taxon_list`|Any valid path|Path to the file with the taxa (10-digit codes) to include in the output|None|
|
||||
|`--data`|Any valid path|Path to the input dataset. The format of this varies depending on your `--start` parameter. If you are running the contamination loop starting with trees, this folder must include both trees **AND** a fasta file for each tree (with identical file names other than the extension) that includes an amino-acid sequence for each tip of the tree (with the sequence names matching exactly the tip names)|None|
|
||||
|`--output`|Any valid path|Directory where the output folder should be created. If not given, the folder will be created in the parent directory of the folder containing the scripts|`../`|
|
||||
|`--gf_list`|Valid path|Path to the file with the GFs of interest. Only required if starting from the raw dataset|None|
|
||||
|`--taxon_list`|Valid path|Path to the file with the taxa (10-digit codes) to include in the output|None|
|
||||
|`--data`|Valid path|Path to the input dataset. The format of this varies depending on your `--start` parameter. If you are running the contamination loop starting with trees, this folder must include both trees **AND** a fasta file for each tree (with identical file names other than the extension) that includes an amino-acid sequence for each tip of the tree (with the sequence names matching exactly the tip names)|None|
|
||||
|`--output`|Valid path|Directory where the output folder should be created. If not given, the folder will be created in the parent directory of the folder containing the scripts|`../`|
|
||||
|
||||
### Modularity
|
||||
Below are several optional ways to parameterize EukPhylo Part 2
|
||||
@ -163,16 +163,16 @@ Below are several optional ways to parameterize EukPhylo Part 2
|
||||
|`--sim_taxa`|no|A file listing taxa (10-digit codes) to apply the similarity filter on (e.g. sim_taxa.txt)|NA|
|
||||
|
||||
**For removing known poor-quality or contaminant sequences (user informed):**
|
||||
|Parameter|Description|
|
||||
|:---|:---|
|
||||
|`--blacklist`|str; A file listing sequence IDs to remove from analysis (e.g. to_remove.txt)|
|
||||
|Parameter|Options|Description|
|
||||
|:---|:---|:---|
|
||||
|`--blacklist`|str|A file listing sequence IDs to remove from analysis (e.g. to_remove.txt)|
|
||||
|
||||
**For removing sequences based on GC composition:**
|
||||
|
||||
*Note: you must first identify sequences with OGA, OGG, OG6 using the GC_identifier.py script [here](https://github.com/Katzlab/EukPhylo/tree/main/Utilities/for_fastas) on GitHub*
|
||||
|Parameter|Options|Description|Default|
|
||||
|:---|:---|:---|:---|
|
||||
|`--og_identifier`|`OG`,`OG6`,`OGA`,`OGG`|Select sequences by GC width|`OG`
|
||||
|`--og_identifier`|`OG`, `OG6`, `OGA`, `OGG`|Select sequences by GC width|`OG`
|
||||
|
||||
## Contamination Removal
|
||||
Contamination removal within EukPhylo (also called Contamination Loop or CL) allows for sequence removal based on Sisters/Subsisters identification or based on Clades diversity. An examplar run is available in [Figshare](https://figshare.com/articles/dataset/Examplar_runs_PhyloToL_and_CLoop/26662018)
|
||||
@ -199,15 +199,15 @@ Options:
|
||||
|
||||
| Parameter | Required | Options | Description | Default |
|
||||
| ------------- | ------------- | ------------- | ------------- | ------------- |
|
||||
|`--contamination_loop`|yes|seq, clade|The mode in which to run the CL|NA|
|
||||
|`--contamination_loop`|yes|`seq`, `clade`|The mode in which to run the CL|NA|
|
||||
|`--nloops`|no|positive int|Number of iterations|`5`|
|
||||
|`--sister_rules`|only in sisters mode|Any valid path|Path to a text file containing sisters rules|NA|
|
||||
|`--subsister_rules`|only in subsisters mode|Any valid path|Path to a text file containing subsisters rules|NA|
|
||||
|`--clade_grabbing_rules`|only in clade mode|Any valid path|Path to a text file containing clade-grabbing rules|NA|
|
||||
|`--clade_grabbing_exceptions`|no|Any valid path|List of taxa to _not_ remove for any reason|NA|
|
||||
|`--sister_rules`|only in sisters mode|Valid path|Path to a text file containing sisters rules|NA|
|
||||
|`--subsister_rules`|only in subsisters mode|Valid path|Path to a text file containing subsisters rules|NA|
|
||||
|`--clade_grabbing_rules`|only in clade mode|Valid path|Path to a text file containing clade-grabbing rules|NA|
|
||||
|`--clade_grabbing_exceptions`|no|Valid path|List of taxa to _not_ remove for any reason|NA|
|
||||
|`--cl_tree_method`|no|`iqtree`, `raxml`, `fasttree`, `iqtree_fast`|Tree-building method to use in each contamination loop iteration|`fasttree`|
|
||||
|`--cl_alignment_method`|no|`mafft_only`, `guidance`|Alignment method to use in each contamination loop iteration|`mafft_only`|
|
||||
|`--cl_exclude_taxa`|no|Any valid path|Path to a file containing taxon names present in input MSA/tree files but which should be removed in the first iteration of the contamination loop|NA|
|
||||
|`--cl_exclude_taxa`|no|Valid path|Path to a file containing taxon names present in input MSA/tree files but which should be removed in the first iteration of the contamination loop|NA|
|
||||
|
||||
|
||||
## Concatenation
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user