mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-27 18:50:25 +08:00
Updated EukPhylo QuickStart (markdown)
parent
ee9bb48a45
commit
527e4e2b8b
@ -141,7 +141,7 @@ Below are several optional ways to parameterize EukPhylo Part 2
|
||||
**General:**
|
||||
|Parameter|Options|Description|Default|
|
||||
|:---|:---|:---|:---|
|
||||
|`--force`||Overwrites all existing files in the `Output` folder|
|
||||
|`--force`||Overwrites all existing files in the `Output` folder|NA|
|
||||
|`--tree_method`|`iqtree`, `iqtree_fast`, `raxml`, `fasttree`|Change tree building software|`iqtree`|
|
||||
|
||||
**For BLAST and GUIDANCE:**
|
||||
@ -156,26 +156,26 @@ Below are several optional ways to parameterize EukPhylo Part 2
|
||||
|`--guidance_threads`|int|Number of threads to allocate to Guidance|`20`|
|
||||
|
||||
**For reducing number of similar sequences:**
|
||||
|Parameter|Required|Options|Help|
|
||||
|Parameter|Required|Description|Default|
|
||||
|:---|:---|:---|:---|
|
||||
|`--similarity_filter`|yes|action = store_true|Run the similarity filter in pre-Guidance|
|
||||
|`--sim_cutoff`|yes|default = 1, type = float|Sequences from the same taxa that are assigned to the same OG are removed if they are more similar than this cutoff|
|
||||
|`--sim_taxa`|no|default = None|A file listing taxa (10-digit codes) to apply the similarity filter on (e.g. sim_taxa.txt)|
|
||||
|`--similarity_filter`|yes|Run the similarity filter in pre-Guidance|NA|
|
||||
|`--sim_cutoff`|yes|float|Sequences from the same taxa that are assigned to the same OG are removed if they are more similar than this cutoff|`1`|
|
||||
|`--sim_taxa`|no|A file listing taxa (10-digit codes) to apply the similarity filter on (e.g. sim_taxa.txt)|NA|
|
||||
|
||||
**For removing known poor-quality or contaminant sequences (user informed):**
|
||||
|Parameter|Description|
|
||||
|:---|:---|
|
||||
|`--blacklist`|type = str; A file listing sequence IDs to remove from analysis (e.g. to_remove.txt)|
|
||||
|`--blacklist`|str; A file listing sequence IDs to remove from analysis (e.g. to_remove.txt)|
|
||||
|
||||
**For removing sequences based on GC composition:**
|
||||
|
||||
*Note: you must first identify sequences with OGA, OGG, OG6 using the GC_identifier.py script [here](https://github.com/Katzlab/EukPhylo/tree/main/Utilities/for_fastas) on GitHub*
|
||||
|Parameter|Options|Help|
|
||||
|:---|:---|:---|
|
||||
|`--og_identifier`|default = `OG`, choices = `OG`,`OG6`,`OGA`,`OGG`|Select sequences by GC width|
|
||||
|Parameter|Options|Description|Default|
|
||||
|:---|:---|:---|:---|
|
||||
|`--og_identifier`|`OG`,`OG6`,`OGA`,`OGG`|Select sequences by GC width|`OG`
|
||||
|
||||
## Contamination Removal
|
||||
Contamination removal within EukPhylo (also called Contamination Loop) allows for sequence removal based on Sisters/Subsisters identification or based on Clades diversity. An examplar run is available in [Figshare](https://figshare.com/articles/dataset/Examplar_runs_PhyloToL_and_CLoop/26662018)
|
||||
Contamination removal within EukPhylo (also called Contamination Loop or CL) allows for sequence removal based on Sisters/Subsisters identification or based on Clades diversity. An examplar run is available in [Figshare](https://figshare.com/articles/dataset/Examplar_runs_PhyloToL_and_CLoop/26662018)
|
||||
|
||||
### Set up:
|
||||
* An input folder (called for example Input), with both
|
||||
@ -187,11 +187,11 @@ Contamination removal within EukPhylo (also called Contamination Loop) allows fo
|
||||
* the Scripts Folder
|
||||
|
||||
### Running:
|
||||
Basic running of the Contamination loop, with the sister mode:
|
||||
Basic running of the Contamination Loop, with the sister mode:
|
||||
|
||||
`python3 Scripts/eukphylo.py --start trees --end trees --data Input --output Output --contamination_loop seq --sister_rules sister_rules_file.txt > log.out`
|
||||
|
||||
Basic running of the Contamination loop, with the clade mode:
|
||||
Basic running of the Contamination Loop, with the clade mode:
|
||||
|
||||
`python3 Scripts/eukphylo.py --start trees --end trees --data Input --output Output --contamination_loop clade --clade_grabbing_rules_file clade_grabbing_rules.txt > log.out`
|
||||
|
||||
@ -199,15 +199,15 @@ Options:
|
||||
|
||||
| Parameter | Required | Options | Description | Default |
|
||||
| ------------- | ------------- | ------------- | ------------- | ------------- |
|
||||
| --contamination_loop | yes | seq, clade | The mode in which to run the CL | none |
|
||||
| --nloops | no | any positive integer | Number of iterations | `5` |
|
||||
| --sister_rules | only in sisters mode | Any valid path | Path to a text file containing sisters rules | none |
|
||||
| --subsister_rules | only in subsisters mode | Any valid path | Path to a text file containing subsisters rules | none |
|
||||
| --clade_grabbing_rules | only in clade mode | Any valid path | Path to a text file containing clade-grabbing rules | none |
|
||||
| --clade_grabbing_exceptions | no | Any valid path | List of taxa to _not_ remove for any reason | none |
|
||||
| --cl_tree_method | no | `iqtree`, `raxml`, `fasttree`, `iqtree_fast` | Tree-building method to use in each contamination loop iteration. | fasttree |
|
||||
| --cl_alignment_method | no | `mafft_only`, `guidance` | Alignment method to use in each contamination loop iteration. | `mafft_only`|
|
||||
| --cl_exclude_taxa | no | Any valid path | Path to a file containing taxon names present in input MSA/tree files but which should be removed in the first iteration of the contamination loop. | none |
|
||||
|`--contamination_loop`|yes|seq, clade|The mode in which to run the CL|NA|
|
||||
|`--nloops`|no|positive int|Number of iterations|`5`|
|
||||
|`--sister_rules`|only in sisters mode|Any valid path|Path to a text file containing sisters rules|NA|
|
||||
|`--subsister_rules`|only in subsisters mode|Any valid path|Path to a text file containing subsisters rules|NA|
|
||||
|`--clade_grabbing_rules`|only in clade mode|Any valid path|Path to a text file containing clade-grabbing rules|NA|
|
||||
|`--clade_grabbing_exceptions`|no|Any valid path|List of taxa to _not_ remove for any reason|NA|
|
||||
|`--cl_tree_method`|no|`iqtree`, `raxml`, `fasttree`, `iqtree_fast`|Tree-building method to use in each contamination loop iteration|`fasttree`|
|
||||
|`--cl_alignment_method`|no|`mafft_only`, `guidance`|Alignment method to use in each contamination loop iteration|`mafft_only`|
|
||||
|`--cl_exclude_taxa`|no|Any valid path|Path to a file containing taxon names present in input MSA/tree files but which should be removed in the first iteration of the contamination loop|NA|
|
||||
|
||||
|
||||
## Concatenation
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user