Updated PhyloToL Part 1 (markdown)

Katzlab 2024-08-09 16:39:12 -04:00
parent d6ab88cf13
commit 84d3e4cb6a

@ -39,7 +39,18 @@ At this point, you are ready to run the code! See the [Processing transcriptomes
#### Genomes #### Genomes
PhyloToL part 1 for genomes takes as input genomic CDS, such as are available to download for many genome assemblies on GenBank PhyloToL part 1 for genomes takes as input genomic CDS, such as are available to download for many genome assemblies on GenBank. Similarly to the transcriptome setup above, each input file must be named in the format
>Op_me_Hsap_GenBankCDS.fasta
with the first ten digits representing a unique sample identifier. Each sequence in the CDS fasta file should be formatted as downloaded from GenBank:
>/>lcl|NC_000001.11_cds_NP_001005484.2_1 [gene=OR4F5] [db_xref=CCDS:CCDS30547.1,Ensembl:ENSP00000493376.2,GeneID:79501] [protein=olfactory receptor 4F5] [protein_id=NP_001005484.2] [location=join(65565..65573,69037..70008)] [gbkey=CDS]
ATGAAGAAGGTAACTGCAGAGGCTATTTCCTGGAATGAATCAACGAGTGAAACGAATAACTCTATGGTGACTGAATTCAT
TTTTCTGGGTCTCTCTGATTCTCAGGAACTCCAGACCTTCCTATTTATGTTGTTTTTTGTATTCTATGGAGGAATCGTGT
TTGGAAACCTTCTTATTGTCATAACAGTGGTATCTGACTCCCACCTTCACTCTCCCATGTACTTCCTGCTAGCCAACCTC...
And all of the CDS fasta files should be in a folder alongside the [Scripts](https://github.com/Katzlab/PhyloToL-6/blob/main/PTL1/Genomes/Scripts) and [Databases](https://github.com/Katzlab/PhyloToL-6/blob/main/PTL1/Genomes/Databases) folders, as above.
## The Hook Database ## The Hook Database