mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-29 00:30:24 +08:00
Updated PhyloToL Part 2: MSAs, trees, and contamination loop (markdown)
parent
62aa5650a6
commit
66f7b5bf68
@ -187,13 +187,7 @@ To run the CL, use a similar command structure as described for running PhyloToL
|
||||
|
||||
PhyloToL includes an optional step, which can be run after the tree-building stage (or by using `--start trees` and passing to the `--data` argument a folder of trees and corresponding alignments or unaligned sequences files), to select orthologs (one sequence at most per tax from each GF) and build a concatenated alignment. PhyloToL first identifies for each taxon the monophyletic clade with the greatest number of species from that taxon's minor clade, using the first five digits of that taxon's sample identifier (e.g., Op_me for metazoa); alternatively, a user can select orthologs for only a target group of taxon using the `--concat_target_taxa` argument by inputting a file with a list of ten digit codes, or just a single ten-digit code or clade prefix. If only one sequence from the taxon falls into this largest clade, that's the sequence chosen for concatenation; otherwise, then a score is given to each sequence equal to its length times is k-mer coverage for transcriptomic data, and just the sequence length for genomic data, and the sequence with the highest score is taken. If a GF is not present in a taxon, then the space is filled with gaps in the concatenated alignments. This step produces a clearly labeled concatenated alignment, as well as a folder called "DataToConcatenate" in which can be found all the selected orthologs for each GF, aligned and unaligned.
|
||||
|
||||
To run this step, add the `--concatenate` flag to your PhyloToL command. Parameters are:
|
||||
|
||||
| Parameter | Options | Description | Default |
|
||||
|-|-|-|-|
|
||||
| --concatenate | flag | Remove paralogs and generate an alignment for concatenation | False |
|
||||
| --concat_target_taxa | _str_ | The taxonomic group (sequence prefix), groups, or a file containing a list of groups (multiple prefixes) for which to select sequences to construct a concatenated alignment | None |
|
||||
|
||||
To run this step, add the `--concatenate` flag to your PhyloToL command. If you don't include this flag, concatenation will by default not be run. If you want to run concatenation alone (i.e., you already have trees and alignments) then you'll have to set up your input data in the style of PhyloToL's "Output" folder. Namely, create a folder called "Output" in the same folder that contains the "Scripts" folder. Inside the "Output" folder, create a folder called "Trees" and a folder called "Guidance." Put your input trees in the Trees folder and the input alignments in the Guidance folder. Each file in the Trees folder should have a corresponding file in the Guidance folder; the names of these files should match up until the last period (i.e., until the file extension). For example, for gene family OG6_100206 you might have a tree file called OG6_100206.PostCL.tre and an alignment file called OG6_100206.PostCL.fasta. Here, the "OG6_100206.PostCL" must match.
|
||||
|
||||
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user