diff --git a/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md b/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md index 8c7a5d7..d3c37a0 100644 --- a/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md +++ b/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md @@ -149,7 +149,10 @@ In the sisters- and subsisters-modes, your rules file should include three colum |Op_ch_Dgra | Sr_di | 0.1| |-|-|-| -indicates that the a sequence from the choanoflagellate Op_ch_Dgra should be removed if it is sister to any dinoflagellate (any sequence beginning with the prefix Sr_di) on a branch that is less than one tenth the average branch length in the gene tree. +indicates that the a sequence from the choanoflagellate Op_ch_Dgra should be removed if it is sister to any dinoflagellate (any sequence beginning with the prefix Sr_di) on a branch that is less than one tenth the average branch length in the gene tree. In some cases, a user might want to remove a sequence regardless of branch length, in which case the third column is left blank: so here we are removing a ciliate anytime it falls sister to a green alga regardless of branch length: + +|Sr_ci_Fsal | Pl_gr | +|-|-| In clade-grabbing mode, each row again represents a rule. This time, there are five columns. The first column gives the target taxonomic group for which you are clade grabbing. Here you can give a ten-digit code, a subset of a code, or even the path to a text file containing a list of multiple codes if they don't all share a precise enough prefix. The third column gives the minimum number of target taxa that must be in a clade for it to be kept, and the second column gives the minimum proportion (or absolute number of >1) of taxa in that clade that are not in the target group. The fourth column allows you to give a list of 'special' taxa (or just a ten-digit code or a subset of a code), X of which must be present in a clade for it to be selected, where X is the number in the fifth column. For example, the line