Updated PhyloToL Part 2: MSAs, trees, and contamination loop (markdown)

MCLeleu 2024-08-12 23:38:06 +02:00
parent 8b09b0de95
commit 7dce3919df

@ -18,6 +18,16 @@ If you want to produce up to guidance files, you will change the default '--end
If you want to start at a different point other than raw data, you will change the default '--start parameter to 'unaligned', 'aligned', or 'trees'. With these choices, this is the line you could run, with minimum requierements: If you want to start at a different point other than raw data, you will change the default '--start parameter to 'unaligned', 'aligned', or 'trees'. With these choices, this is the line you could run, with minimum requierements:
> python Scripts/phylotol.py --start raw --end trees --gf_list listofOGs.txt --taxon_list taxon_list.txt --data Input_folder --output Output_folder > Output1.out > python Scripts/phylotol.py --start raw --end trees --gf_list listofOGs.txt --taxon_list taxon_list.txt --data Input_folder --output Output_folder > Output1.out
**provides the table with list of options flags parameters here **
Optional arguments can then be added to the command line, and will be described bellow.
## Filtering on GC composition
The filtering by GC content is done during pre-guidance and it selects only sequences that fall within a specified range (user defined ranges).
The renaming of each sequence is done using a utility script (GC_identifier.py) which renames the sequences with OGG, OG6, and OGA depending on if the sequence GC content falls below or above the user specified GC range.
The parameters for this when running pre-guidance is --og_identifier and the options are 'OG','OG6','OGA','OGG' with the default being OG and passing all the sequences to guidance without filtering.
## Overlap and similarity filters ## Overlap and similarity filters
## Guidance ## Guidance