mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-27 18:00:25 +08:00
Updating headers in 6_FilterPartials.py
This commit is contained in:
parent
a33941b4ed
commit
0167e3da16
@ -1,31 +1,19 @@
|
|||||||
#!/usr/bin/env python3.5
|
# Last updated Sept 2023
|
||||||
|
# Authors: Xyrus Maurer-Alcala and Auden Cote-L'Heureux
|
||||||
##__Updated__: 2023-09-27 by Auden Cote-L'Heureux
|
|
||||||
##__Author__: Xyrus Maurer-Alcala; maurerax@gmail.com; xyrus.maurer-alcala@izb.unibe.ch
|
|
||||||
##__Usage__: python 6_FilterPartials.py --help
|
|
||||||
|
|
||||||
|
|
||||||
##################################################################################################
|
# This script is intended to remove incomplete transcripts that have a more complete mate.
|
||||||
## This script is intended to remove incomplete transcripts that have a more complete mate ##
|
# First, all sequences shorter than 33% or longer than 150% the average length of sequences
|
||||||
## ##
|
# from the same OG in the Hook database are removed. Then, for each transcriptomic sample,
|
||||||
## Prior to running this script, ensure the following: ##
|
# all sequences within an OG are compared at the nucleotide level to the sequence with the
|
||||||
## ##
|
# highest “score” (defined as k-mer coverage multiplied by length). The script should be run
|
||||||
## 1. You have assembled your transcriptome and COPIED the 'assembly' file ##
|
# as part of the PhyloToL 6 Part 1 pipeline using the script wrapper.py. It requires that the
|
||||||
## (contigs.fasta, or scaffolds.fasta) to the PostAssembly Folder ##
|
# structure of the 'Output' folder be as output by script 5, and that the Databases/db_OG folder
|
||||||
## 2. Removed small sequences (usually sequences < 200bp) ##
|
# contains a .fasta file containing all amino acid sequences in the OG reference database (Hook)
|
||||||
## 3. Removed SSU/LSU sequences from your Fasta File ##
|
# with the same file name (until the extension) as the .dmnd file for the reference database used
|
||||||
## 4. Classified your sequences as Strongly Prokaryotic/Eukaryotic or Undetermined ##
|
# in script 3.
|
||||||
## 5. Classified sequences into OGs ##
|
|
||||||
## 6. You either know (or have inferred) the genetic code of the organism ##
|
|
||||||
## 7. You have translated the sequences and checked for the data in the RemovePartials folder ##
|
|
||||||
## ##
|
|
||||||
## E-mail Xyrus (author) for help if needed: maurerax@gmail.com ##
|
|
||||||
## ##
|
|
||||||
## Next Script(s) to Run: ##
|
|
||||||
## 7_FinalRename.py ##
|
|
||||||
## ##
|
|
||||||
##################################################################################################
|
|
||||||
|
|
||||||
|
#Dependencies
|
||||||
from Bio import SeqIO
|
from Bio import SeqIO
|
||||||
from Bio.Seq import Seq
|
from Bio.Seq import Seq
|
||||||
from statistics import mean
|
from statistics import mean
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user