From 7dce3919df85f5e7611f2030aaed50b790239930 Mon Sep 17 00:00:00 2001 From: MCLeleu <123706003+MCLeleu@users.noreply.github.com> Date: Mon, 12 Aug 2024 23:38:06 +0200 Subject: [PATCH] Updated PhyloToL Part 2: MSAs, trees, and contamination loop (markdown) --- ...oToL-Part-2:-MSAs,-trees,-and-contamination-loop.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md b/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md index 44de1f4..0a03149 100644 --- a/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md +++ b/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md @@ -18,6 +18,16 @@ If you want to produce up to guidance files, you will change the default '--end If you want to start at a different point other than raw data, you will change the default '--start’ parameter to 'unaligned', 'aligned', or 'trees'. With these choices, this is the line you could run, with minimum requierements: > python Scripts/phylotol.py --start raw --end trees --gf_list listofOGs.txt --taxon_list taxon_list.txt --data Input_folder --output Output_folder > Output1.out +**provides the table with list of options flags parameters here ** + +Optional arguments can then be added to the command line, and will be described bellow. + +## Filtering on GC composition + +The filtering by GC content is done during pre-guidance and it selects only sequences that fall within a specified range (user defined ranges). +The renaming of each sequence is done using a utility script (GC_identifier.py) which renames the sequences with OGG, OG6, and OGA depending on if the sequence GC content falls below or above the user specified GC range. +The parameters for this when running pre-guidance is ‘--og_identifier’ and the options are 'OG','OG6','OGA','OGG' with the default being ‘OG’ and passing all the sequences to guidance without filtering. + ## Overlap and similarity filters ## Guidance