diff --git a/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md b/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md index 44de1f4..0a03149 100644 --- a/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md +++ b/PhyloToL-Part-2:-MSAs,-trees,-and-contamination-loop.md @@ -18,6 +18,16 @@ If you want to produce up to guidance files, you will change the default '--end If you want to start at a different point other than raw data, you will change the default '--start’ parameter to 'unaligned', 'aligned', or 'trees'. With these choices, this is the line you could run, with minimum requierements: > python Scripts/phylotol.py --start raw --end trees --gf_list listofOGs.txt --taxon_list taxon_list.txt --data Input_folder --output Output_folder > Output1.out +**provides the table with list of options flags parameters here ** + +Optional arguments can then be added to the command line, and will be described bellow. + +## Filtering on GC composition + +The filtering by GC content is done during pre-guidance and it selects only sequences that fall within a specified range (user defined ranges). +The renaming of each sequence is done using a utility script (GC_identifier.py) which renames the sequences with OGG, OG6, and OGA depending on if the sequence GC content falls below or above the user specified GC range. +The parameters for this when running pre-guidance is ‘--og_identifier’ and the options are 'OG','OG6','OGA','OGG' with the default being ‘OG’ and passing all the sequences to guidance without filtering. + ## Overlap and similarity filters ## Guidance