Updated PhyloToL Part 1: GF assignment (markdown)

Katzlab 2024-10-15 15:54:40 +01:00
parent ba9128a074
commit ad191f0330

@ -84,10 +84,14 @@ Running PhyloToL Part 1 on transcriptomes requires three items in your main dire
2. A folder containing your **assembled [transcripts](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/TestData)** (as described above) 2. A folder containing your **assembled [transcripts](https://github.com/Katzlab/PhyloToL-6/tree/main/PTL1/Transcriptomes/TestData)** (as described above)
3. The **Databases** folder described above 3. The **Databases** folder described above
PhyloToL part 1 starts with your **assembled transcripts** and produces **ReadyToGo files** (R2G; nucleotide coding regions and inferred amino acid sequences with gene families assigned) for each input sample, and **summary statistics** (e.g. composition, length, coverage) for each sequence processed, as well as aggregated across all sequences for each taxon. This part of the pipeline includes seven scripts which must be run in order. Script 1b (removal of contamination from index switching, 'XPC') is optional (see below), and users may choose to stop after script 4 if they are unsure of correct genetic code assignment. Otherwise, users are recommended to run their transcripts through scripts 1 to 7 in a single run. The simplest way to run PhyloToL part 1 is with the following command: PhyloToL part 1 starts with your **assembled transcripts** and produces **ReadyToGo files** (R2G; nucleotide coding regions and inferred amino acid sequences with gene families assigned) for each input sample, and **summary statistics** (e.g. composition, length, coverage) for each sequence processed, as well as aggregated across all sequences for each taxon. This part of the pipeline includes seven scripts which must be run in order. Script 1b (removal of contamination from index switching, 'XPC') is optional (see below), and users may choose to stop after script 4 if they are unsure of correct genetic code assignment. Otherwise, users are recommended to run their transcripts through scripts 1 to 7 in a single run. The simplest way to run PhyloToL part 1 is with one of the following command:
On a grid
`python Scripts/wrapper.py --first_script 1 --last_script 7 --assembled_transcripts AssembledTranscripts --genetic_code Gcode.txt --databases Databases > log.txt` `python Scripts/wrapper.py --first_script 1 --last_script 7 --assembled_transcripts AssembledTranscripts --genetic_code Gcode.txt --databases Databases > log.txt`
On a local computer: navigate to within the Scripts folder
`python3 wrapper.py --first_script 1 --last_script 7 --assembled_transcripts [full_path]/PhyloToL-6/PTL1/Transcriptomes/AssembledTranscripts --genetic_code universal --databases [full_path]/PhyloToL-6/PTL1/Transcriptomes/Databases --output [full_path]/PhyloToL-6/PTL1/Transcriptomes/ > log.txt`
In this case, the file `Gcode.txt` is a text file designating genetic code assignments for each taxon. The file should contain two tab-separated columns; the first column gives a ten-digit sample identifier, and the second column the genetic code assignment to be used in translation (script 5). The genetic code options are: universal, blepharisma, chilodonella, condylostoma, euplotes, peritrich, vorticella, ciliate, mesodinium, taa, tag, tga, and none. If you are not working with ciliates, you should probably choose "universal" for each taxon, or just use the argument `--genetic_code universal` instead of creating a text file. In this case, the file `Gcode.txt` is a text file designating genetic code assignments for each taxon. The file should contain two tab-separated columns; the first column gives a ten-digit sample identifier, and the second column the genetic code assignment to be used in translation (script 5). The genetic code options are: universal, blepharisma, chilodonella, condylostoma, euplotes, peritrich, vorticella, ciliate, mesodinium, taa, tag, tga, and none. If you are not working with ciliates, you should probably choose "universal" for each taxon, or just use the argument `--genetic_code universal` instead of creating a text file.
Other available parameters are: Other available parameters are: