Updating headers in 6_FilterPartials.py

2026-02-10 23:10:25 +08:00 · 2024-01-26 10:43:04 -05:00 · 2024-01-26 10:43:04 -05:00 · 0167e3da16
commit 0167e3da16
parent a33941b4ed
1 changed files with 13 additions and 25 deletions
--- a/PTL1/Transcriptomes/Scripts/6_FilterPartials.py
+++ b/PTL1/Transcriptomes/Scripts/6_FilterPartials.py
@ -1,31 +1,19 @@
-#!/usr/bin/env python3.5
+# Last updated Sept 2023
-
+# Authors: Xyrus Maurer-Alcala and Auden Cote-L'Heureux
 ##__Updated__: 2023-09-27 by Auden Cote-L'Heureux
 ##__Author__: Xyrus Maurer-Alcala; maurerax@gmail.com; xyrus.maurer-alcala@izb.unibe.ch
 ##__Usage__: python 6_FilterPartials.py --help
-##################################################################################################
+# This script is intended to remove incomplete transcripts that have a more complete mate.
-## This script is intended to remove incomplete transcripts that have a more complete mate		##
+# First, all sequences shorter than 33% or longer than 150% the average length of sequences 
-##																								##
+# from the same OG in the Hook database are removed. Then, for each transcriptomic sample, 
-## Prior to running this script, ensure the following:											##
+# all sequences within an OG are compared at the nucleotide level to the sequence with the 
-##																								##
+# highest “score” (defined as k-mer coverage multiplied by length). The script should be run
-## 1. You have assembled your transcriptome and COPIED the 'assembly' file 						##
+# as part of the PhyloToL 6 Part 1 pipeline using the script wrapper.py. It requires that the
-##    (contigs.fasta, or scaffolds.fasta) to the PostAssembly Folder							##
+# structure of the 'Output' folder be as output by script 5, and that the Databases/db_OG folder
-## 2. Removed small sequences (usually sequences < 200bp)			##
+# contains a .fasta file containing all amino acid sequences in the OG reference database (Hook)
-## 3. Removed SSU/LSU sequences from your Fasta File											##
+# with the same file name (until the extension) as the .dmnd file for the reference database used
-## 4. Classified your sequences as Strongly Prokaryotic/Eukaryotic or Undetermined				##
+# in script 3.
 ## 5. Classified sequences into OGs 								##
 ## 6. You either know (or have inferred) the genetic code of the organism						##
 ## 7. You have translated the sequences and checked for the data in the RemovePartials folder	##
 ##																								##
 ## 					E-mail Xyrus (author) for help if needed: maurerax@gmail.com				##
 ##																								##
 ##										Next Script(s) to Run: 									##
 ##						 	  			  7_FinalRename.py										##
 ##																								##
 ##################################################################################################
 #Dependencies
 from Bio import SeqIO
 from Bio.Seq import Seq
 from statistics import mean