From ad191f03301309719a80ed13dece58a7ba992768 Mon Sep 17 00:00:00 2001 From: Katzlab Date: Tue, 15 Oct 2024 15:54:40 +0100 Subject: [PATCH] Updated PhyloToL Part 1: GF assignment (markdown) --- PhyloToL-Part-1:-GF-assignment.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/PhyloToL-Part-1:-GF-assignment.md b/PhyloToL-Part-1:-GF-assignment.md index 6e94c8a..298cc41 100644 --- a/PhyloToL-Part-1:-GF-assignment.md +++ b/PhyloToL-Part-1:-GF-assignment.md @@ -84,10 +84,14 @@ Running PhyloToL Part 1 on transcriptomes requires three items in your main dire 2. A folder containing your **assembled [transcripts](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/TestData)** (as described above) 3. The **Databases** folder described above -PhyloToL part 1 starts with your **assembled transcripts** and produces **ReadyToGo files** (R2G; nucleotide coding regions and inferred amino acid sequences with gene families assigned) for each input sample, and **summary statistics** (e.g. composition, length, coverage) for each sequence processed, as well as aggregated across all sequences for each taxon. This part of the pipeline includes seven scripts which must be run in order. Script 1b (removal of contamination from index switching, 'XPC') is optional (see below), and users may choose to stop after script 4 if they are unsure of correct genetic code assignment. Otherwise, users are recommended to run their transcripts through scripts 1 to 7 in a single run. The simplest way to run PhyloToL part 1 is with the following command: +PhyloToL part 1 starts with your **assembled transcripts** and produces **ReadyToGo files** (R2G; nucleotide coding regions and inferred amino acid sequences with gene families assigned) for each input sample, and **summary statistics** (e.g. composition, length, coverage) for each sequence processed, as well as aggregated across all sequences for each taxon. This part of the pipeline includes seven scripts which must be run in order. Script 1b (removal of contamination from index switching, 'XPC') is optional (see below), and users may choose to stop after script 4 if they are unsure of correct genetic code assignment. Otherwise, users are recommended to run their transcripts through scripts 1 to 7 in a single run. The simplest way to run PhyloToL part 1 is with one of the following command: +On a grid `python Scripts/wrapper.py --first_script 1 --last_script 7 --assembled_transcripts AssembledTranscripts --genetic_code Gcode.txt --databases Databases > log.txt` +On a local computer: navigate to within the Scripts folder +`python3 wrapper.py --first_script 1 --last_script 7 --assembled_transcripts [full_path]/PhyloToL-6/PTL1/Transcriptomes/AssembledTranscripts --genetic_code universal --databases [full_path]/PhyloToL-6/PTL1/Transcriptomes/Databases --output [full_path]/PhyloToL-6/PTL1/Transcriptomes/ > log.txt` + In this case, the file `Gcode.txt` is a text file designating genetic code assignments for each taxon. The file should contain two tab-separated columns; the first column gives a ten-digit sample identifier, and the second column the genetic code assignment to be used in translation (script 5). The genetic code options are: universal, blepharisma, chilodonella, condylostoma, euplotes, peritrich, vorticella, ciliate, mesodinium, taa, tag, tga, and none. If you are not working with ciliates, you should probably choose "universal" for each taxon, or just use the argument `--genetic_code universal` instead of creating a text file. Other available parameters are: