diff --git a/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md b/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md index e4a1598..4d4fbe8 100644 --- a/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md +++ b/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md @@ -74,6 +74,31 @@ Minimum requirements: List of all parameters included in PhyloToL: +Argument | Default | Choices | Help +-- | -- | -- | -- +--start | raw | raw, unaligned, aligned, trees | Stage at which to start running PhyloToL. +--end | trees | unaligned, aligned, trees | Stage until which to run PhyloToL. Options are "unaligned" (up to but not including guidance), "aligned" (up to but not including RAxML), and "trees" which will run through RAxML. +--gf_list | None |   | Path to the file with the GFs of interest. Only required if starting from the raw dataset. +--taxon_list | None |   | Path to the file with the taxa (10-digit codes) to include in the output. +--data |   |   | Path to the input dataset. The format varies depending on your --start parameter. If running the contamination loop starting with trees, this folder must include both trees AND a fasta file for each tree (with identical file names other than the extension) that includes an amino-acid sequence for each tip of the tree (with matching sequence names). +--output | ./ |   | Directory where the output folder should be created. If not given, the folder will be created in the parent directory of the folder containing the scripts. +--force | store_true |   | Overwrite all existing files in the "Output" folder. +--tree_method | iqtree | iqtree, raxml, all | Program to use for tree-building. +--blacklist |   |   | A text file with a list of sequence names not to consider. +--og_identifier | OG | OG, OG6, OGA, OGG | Program to use for selecting sequences by GC width. +--sim_taxa | None |   | Path to the file with the taxa (10-digit codes) to apply the similarity filter on. +--blast_cutoff | 1e-20 |   | Blast e-value cutoff. +--len_cutoff | 10 |   | Amino acid length cutoff for removal of very short sequences after column removal in Guidance. +--similarity_filter | store_true |   | Run the similarity filter in pre-Guidance. +--sim_cutoff | 1 |   | Sequences from the same taxa that are assigned to the same OG are removed if they are more similar than this cutoff. +--guidance_iters | 5 |   | Number of Guidance iterations for sequence removal. +--seq_cutoff | 0.3 |   | During guidance, taxa are removed if their score is below this cutoff. +--col_cutoff | 0.0 |   | During guidance, columns are removed if their score is below this cutoff. +--res_cutoff | 0.0 |   | During guidance, residues are removed if their score is below this cutoff. +--keep_temp | store_true |   | Use this to keep ALL Guidance intermediate files. +--keep_iter / -z | store_true |   | Keep all Guidance iterations (beware this will be very large) + + ## Contamination loop The contamination coop (CL) is implemented within PhyloToL to allow the removal of contaminants based on the topology of each tree (phylgoeny-informed contamination removal). Three modes are available: sister-, subsister-, and clade-based contamination removal. All modes take a user defined file of 'rules,' used to identify the sequences to remove.