Updated QuickStart (markdown)

MCLeleu 2025-01-24 11:47:13 +01:00
parent ab5549c18b
commit 94c7bad9e5

@ -72,4 +72,44 @@ Here add detail of each options possible:
### Output: ### Output:
ReadyToGo = AA, NTD ReadyToGo = AA, NTD
Sequences summary Sequences summary
# EukPhylo part 2 = MSA - Trees - Contamination Removal - Concatenation
## Set Up:
* A folder callled “Scripts” containing all the [scripts from Github](https://github.com/Katzlab/EukPhylo/tree/main/PTL2/Scripts)
* Inside the Scripts folder, you also need to add the trimal-trimAl and guidance.v2.02 folders, as downloaded from [here](http://trimal.cgenomics.org/downloads) and [here](https://github.com/anzaika/guidance)
* An empty output folder named as you wish for all output files (which will include trees and guidance files when done running), for example: Output_folder
* A folder called “OutgroupR2Gs” containing the amino acid (AA) ReadyToGo fasta files for your target and outgroup taxa listed in your taxon_list.txt
* A list of the ten-digit codes for your targeted taxa and all outgroup taxa called “taxon_list.txt”
* A .txt file containing your list of OGs to build trees with, for example: listofOGs.txt
## Running
python3 Scripts/phylotol.py --start raw --end trees --gf_list listofOGs.txt --taxon_list taxon_list.txt --data OutgroupR2Gs --output Output_folder > Output1.out
For information on each of the possible input parameters, read below and run “python phylotol.py --help”
* '--start', default = 'raw', choices = {'raw', 'unaligned', 'aligned', 'trees'}, help = 'Stage at which to start running PhyloToL.')
* '--end', default = 'trees', choices = {'unaligned', 'aligned', 'trees'}, help = 'Stage until which to run PhyloToL. Options are "unaligned" (which will run up to but not including guidance), "aligned" (which will run up to but not including RAxML), and "trees" which will run through RAxML')
* '--gf_list', default = None, help = 'Path to the file with the GFs of interest. Only required if starting from the raw dataset.')
* '--taxon_list', default = None, help = 'Path to the file with the taxa (10-digit codes) to include in the output.')
* '--data', help = 'Path to the input dataset. The format of this varies depending on your --start parameter. If you are running the contamination loop starting with trees, this folder must include both trees AND a fasta file for each tree (with identical file names other than the extension) that includes an amino-acid sequence for each tip of the tree (with the sequence names matching exactly the tip names).')
* '--output', default = '../', help = 'Directory where the output folder should be created. If not given, the folder will be created in the parent directory of the folder containing the scripts.')
* '--force', action = 'store_true', help = 'Overwrite all existing files in the "Output" folder.')
* '--tree_method', default = 'iqtree', choices = {'iqtree', 'raxml', 'all'}, help = 'Program to use for tree-building')
* '--blacklist', type = str, help = 'A text file with a list of sequence names not to consider')
* '--og_identifier', default = 'OG', choices = {'OG','OG6','OGA','OGG'}, help = 'Program to use for selecting seq by GC width')
* '--sim_taxa', default = None, help = 'Path to the file with the taxa (10-digit codes) to apply the similarity filter on.')
* '--blast_cutoff', default = 1e-20, type = float, help = 'Blast e-value cutoff')
* '--len_cutoff', default = 10, type = int, help = 'Amino acid length cutoff for removal of very short sequences after column removal in Guidance.')
* '--similarity_filter', action = 'store_true', help = 'Run the similarity filter in pre-Guidance')
* '--sim_cutoff', default = 1, type = float, help = 'Sequences from the same taxa that are assigned to the same OG are removed if they are more similar than this cutoff')
* '--guidance_iters', default = 5, type = int, help = 'Number of Guidance iterations for sequence removal')
* '--seq_cutoff', default = 0.3, type = float, help = 'During guidance, taxa are removed if their score is below this cutoff')
* '--col_cutoff', default = 0.0, type = float, help = 'During guidance, columns are removed if their score is below this cutoff')
* '--res_cutoff', default = 0.0, type = float, help = 'During guidance, residues are removed if their score is below this cutoff')
* '--guidance_threads', default = 20, type = int, help = 'Number of threads to allocate to Guidance')