mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-28 02:10:25 +08:00
Updated EukPhylo Part 1: GF assignment (markdown)
parent
f91a42a632
commit
5b73cd220a
@ -61,7 +61,7 @@ EukPhylo part 1 requires several reference databases used at various steps in th
|
||||
|
||||
Inside the db_BvsE folder are two Diamond-formatted reference databases of diverse eukaryotic (eukout.dmnd) and prokaryotic (micout.dmnd) sequences, used for identification of putative contamination (ultimately labeled _P for putative prokaryotic, vs _E for likely eukaryotic). These are just preliminary assignments that help users interpret data on trees, and should be treated as such. The folder also contains a BLAST+ formatted database of rRNA sequences, used for removal of putative rRNA (putative rDNAs are sequestered in a separate file with the suffix `_rRNAseqs.fasta`). The db_StopFreq folder contains one Diamond-formatted reference database of diverse eukaryotic protein sequences, used for identifying putative reading frames in the calculation of in-frame stop codon frequencies for genetic code assignment (i.e for studies of ciliates and other lineages with aberrant codes). The db_OG folder contains the Hook Database, which MUST be provided as BOTH and fasta file and a Diamond-formatted database, and these files should have the same name up to the extension (e.g. Hook-6.6.fasta, Hook-6.6.dmnd).
|
||||
|
||||
You can download these databases from the [EukPhylo Figshare page](https://figshare.com/projects/EukPhylo_Supplemental_Files/196552). You will have to add the Hook Database to the db_OG folder manually; you can find the Hook Database [here](https://figshare.com/projects/EukPhylo_Supplemental_Files/196552). Convert it to a Diamond database and proceed. Alternatively, you can create your own reference database for gene family assignment (described below).
|
||||
You can download these databases from the [EukPhylo Figshare page](https://figshare.com/projects/EukPhylo_Supplemental_Files/196552). You will have to add the Hook Database to the db_OG folder manually; you can find the Hook Database [here](https://figshare.com/projects/EukPhylo_Supplemental_Files/196552). Convert it to a Diamond database, being mindful of Diamond version, and proceed. Alternatively, you can create your own reference database for gene family assignment (described below).
|
||||
|
||||
### The Hook database
|
||||
Users can either use the EukPhylo Hook database or a set of gene families of interest (e.g. targeting a specific function or taxon). The EukPhylo Hook Database is composed of 1,453,081 sequences across 15,414 GFs, and serves as a reference database against which assembled transcripts and are similarity-searched for GF assignment. The EukPhylo Hook Database captures a broad diversity of eukaryotic gene families and was built using sequence data from OrthoMCL version 6.13, which we sampled to select for OGs that are present across the eukaryotic tree and/or present in under-sampled lineages of eukaryotes (Fig. S1, Figure 2). To add value for users, we also include functional annotations for each OG in the Hook (Dataset S11; see methods in SI Appendix). Alternatively, users can replace the hook as described below.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user