Updated PhyloToL Part 1 (markdown)

Katzlab 2024-08-09 17:35:21 -04:00
parent 0517d7c1f2
commit 39c5d5bbcd

@ -54,9 +54,10 @@ And all of the CDS fasta files should be in a folder alongside the [Scripts](htt
## The Hook Database
PhyloToL6 is designed to have an interchangeable hook. As the Hook database is replaceable and customizable, this step offers an opportunity to filter data for a group of gene families/functional groups of interest.
Users can either use the PhyloToL Hook database or a set of gene families of interest (e.g. targeting a specific function or taxon). The PhyloToL Hook Database is composed of 1,453,081 sequences across 15,414 GFs, and serves as a reference database against which assembled transcripts and are similarity-searched for GF assignment. The PhyloToL Hook Database captures a broad diversity of eukaryotic gene families and was built using sequence data from OrthoMCL version 6.13, which we sampled to select for OGs that are present across the eukaryotic tree and/or present in under-sampled lineages of eukaryotes (Fig. S1, Figure 2). To add value for users, we also include functional annotations for each OG in the Hook (Dataset S11; see methods in SI Appendix). Alternatively, users can replace the hook as described below.
###Swapping out the hook
### Swapping out the hook
What to know about using your own hook:
* You will need two files, a fasta file with your target sequences, and this same file reformatted as a diamond database (diamond makedb --in <path to fasta file> -d <name you want to give it>.dmnd)
* The hook lives in the db_OG database folder