Updated PhyloToL Part 1: GF assignment (markdown)

Godwin Ani 2024-08-13 10:55:47 -04:00
parent 245f244e3a
commit aa95684e83

@ -65,9 +65,9 @@ Replacing the PhyloToL Hook DB with a user-defined set of gene families is strai
## Running PhyloToL part 1
# Running PhyloToL part 1
### Processing transcriptomes
## Processing transcriptomes
Role of each script
<img src="https://github.com/Katzlab/PhyloToL-6/blob/main/Other/PTL1_Processing_Transcriptomes_scripts.png" width="100%">
@ -91,7 +91,7 @@ Available parameters are:
| --seq_count |int|-| minimum number of sequences after assigning OGs |
#### Index Switching (Cross plate contamination)
### Index Switching (Cross plate contamination)
As you run PhyloToL part 1 on transcriptomes, you might want to remove sequences from your assembled transcripts that are a result of index switching. This is done by (**LAK and ACL on XPC removal process with conspecific file**). To include this parameter to your PhyloToL part 1 run, you will need to add the '--xplate_contam --conspecific_names Conspecific.txt' flag to the command line as follow:
`python Scripts/wrapper.py --first_script 1 --last_script 7 --assembled_transcripts AssembledTranscripts --output . --genetic_code Gcode.txt --databases Databases --xplate_contam --conspecific_names Conspecific.txt > log.txt`
@ -106,14 +106,14 @@ Example of a Conspecific.txt file
### Processing genomes
## Processing genomes
Role of each script
<img src="https://github.com/Katzlab/PhyloToL-6/blob/main/Other/PTL1_Processing_Genomes_scripts.png" width="100%">
Running PhyloToL Part 1 on genomes requires at least 3 items in your main directory: 1) A folder named Scripts and containing all **[Scripts](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Genomes/Scripts)** from PhyloToL part 1 github, 2) a folder containing your **[CDS](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Genomes/TestData)** (as described above), and 3) a folder containing the **Databases** with three subfolders(db_BvsE (how we ID likely-bacterial sequences), db_StopFreq (for stop codon assignment), and db_OG (The hook database as described above)). Default script starts with your **CDS** and produces **ReadyToGo files** (nucleotide and amino acid sequences) of each taxa, and **summary information** of the sequences processed for those taxa.
#### To run the PhyloToL part 1 for processing genomes, run:
* To run the PhyloToL part 1 for processing genomes, run:
`python Scripts/wrapper.py --first_script 1 --last_script 5 --cds CDS --output . --genetic_code Gcode.txt --databases Databases > log.txt`