diff --git a/PhyloToL-Part-1:-GF-assignment.md b/PhyloToL-Part-1:-GF-assignment.md index caad48f..5e13d9b 100644 --- a/PhyloToL-Part-1:-GF-assignment.md +++ b/PhyloToL-Part-1:-GF-assignment.md @@ -65,9 +65,9 @@ Replacing the PhyloToL Hook DB with a user-defined set of gene families is strai -## Running PhyloToL part 1 +# Running PhyloToL part 1 -### Processing transcriptomes +## Processing transcriptomes Role of each script @@ -91,7 +91,7 @@ Available parameters are: | --seq_count |int|-| minimum number of sequences after assigning OGs | -#### Index Switching (Cross plate contamination) +### Index Switching (Cross plate contamination) As you run PhyloToL part 1 on transcriptomes, you might want to remove sequences from your assembled transcripts that are a result of index switching. This is done by (**LAK and ACL on XPC removal process with conspecific file**). To include this parameter to your PhyloToL part 1 run, you will need to add the '--xplate_contam --conspecific_names Conspecific.txt' flag to the command line as follow: `python Scripts/wrapper.py --first_script 1 --last_script 7 --assembled_transcripts AssembledTranscripts --output . --genetic_code Gcode.txt --databases Databases --xplate_contam --conspecific_names Conspecific.txt > log.txt` @@ -106,14 +106,14 @@ Example of a Conspecific.txt file -### Processing genomes +## Processing genomes Role of each script Running PhyloToL Part 1 on genomes requires at least 3 items in your main directory: 1) A folder named Scripts and containing all **[Scripts](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Genomes/Scripts)** from PhyloToL part 1 github, 2) a folder containing your **[CDS](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Genomes/TestData)** (as described above), and 3) a folder containing the **Databases** with three subfolders(db_BvsE (how we ID likely-bacterial sequences), db_StopFreq (for stop codon assignment), and db_OG (The hook database as described above)). Default script starts with your **CDS** and produces **ReadyToGo files** (nucleotide and amino acid sequences) of each taxa, and **summary information** of the sequences processed for those taxa. -#### To run the PhyloToL part 1 for processing genomes, run: +* To run the PhyloToL part 1 for processing genomes, run: `python Scripts/wrapper.py --first_script 1 --last_script 5 --cds CDS --output . --genetic_code Gcode.txt --databases Databases > log.txt`