Updating headers in 6_FilterPartials.py

This commit is contained in:
Auden Cote-L'Heureux 2025-03-19 09:20:37 -04:00 committed by GitHub
parent d1b9a64e60
commit 8487f1d836
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -5,7 +5,8 @@
# First, all sequences shorter than 33% or longer than 150% the average length of sequences
# from the same OG in the Hook database are removed. Then, for each transcriptomic sample,
# all sequences within an OG are compared at the nucleotide level to the sequence with the
# highest “score” (defined as k-mer coverage multiplied by length). The script should be run
# highest “score” (defined as k-mer coverage multiplied by length) using BLAST, and sequences that
# are 98% identical to the master sequence are removed. The script should be run
# as part of the EukPhylo Part 1 pipeline using the script wrapper.py. It requires that the
# structure of the 'Output' folder be as output by script 5, and that the Databases/db_OG folder
# contains a .fasta file containing all amino acid sequences in the OG reference database (Hook)