mirror of
http://43.156.76.180:8026/YuuMJ/EukPhylo.git
synced 2025-12-28 10:00:27 +08:00
Updated PhyloToL Part 1: GF assignment (markdown)
parent
af2e0ae915
commit
cd8152bdc2
@ -96,19 +96,19 @@ On a local computer: First navigate to within the Scripts folder and then run:
|
||||
In this case, the file `Gcode.txt` is a text file designating genetic code assignments for each taxon. The file should contain two tab-separated columns; the first column gives a ten-digit sample identifier, and the second column the genetic code assignment to be used in translation (script 5). The genetic code options are: universal, blepharisma, chilodonella, condylostoma, euplotes, peritrich, vorticella, ciliate, mesodinium, taa, tag, tga, and none. If you are not working with ciliates, you should probably choose "universal" for each taxon, or just use the argument `--genetic_code universal` instead of creating a text file.
|
||||
|
||||
Other available parameters are:
|
||||
| Parameter | Type| Options| Description|
|
||||
| Parameter | Input type| Options| Description|
|
||||
| ----------- | ----------------- |----------- | ----------------- |
|
||||
| --first_script |int |1, 2, 3, 4, 5, 6 | First script to run |
|
||||
| --last_script |int|1, 2, 3, 4, 5, 6, 7 | Last script to run |
|
||||
| --assembled_transcripts |str|Path to a folder of assembled transcripts, assembled by rnaSPAdes. | Each assembled transcript file name should start with a unique 10 digit code, and end in "_assembledTranscripts.fasta", E.g. Op_me_hsap_assembledTranscripts.fasta |
|
||||
| --databases| str| Path to databases folder | The folder should contain all 3 databases|
|
||||
| --output|str|Path for the output files | An "Output" folder will be created at this directory to contain all output files. By default this folder will be created at the parent directory of the Scripts folder |
|
||||
|--xplate_contam |-|- | Run cross-plate contamination removal (includes all files) |
|
||||
| --genetic_code |str|A .txt or .tsv with two tab-separated columns, the first with the ten-digit codes and the second column with the corresponding genetics codes| If all of your taxa use the same genetic code, you may enter it here. Alternatively, if you need to use a variety of genetic codes but know which codes to use, you may fill give here the path to a file. |
|
||||
|--conspecific_names |str| A .txt or .tsv file with two tab-separated columns; the first should have 10 digit codes, the second species or other identifying names|This is used to determine which sequences to remove (only between "species") in index switching (cross-plate contamination) assessment. |
|
||||
| --minlen |int| -| Minimum transcript length |
|
||||
| --maxlen |int|-| Maximum transcript length |
|
||||
| --seq_count |int|-| Minimum number of sequences after assigning GFs |
|
||||
| --first_script |integer |1, 2, 3, 4, 5, 6 | First script to run |
|
||||
| --last_script |integer|1, 2, 3, 4, 5, 6, 7 | Last script to run |
|
||||
| --assembled_transcripts |string|Path to a folder of assembled transcripts, assembled by rnaSPAdes. | Each assembled transcript file name should start with a unique 10 digit code, and end in "_assembledTranscripts.fasta", E.g. Op_me_hsap_assembledTranscripts.fasta |
|
||||
| --databases| string| Path to databases folder | The folder should contain all 3 databases|
|
||||
| --output|string|Path for the output files | An "Output" folder will be created at this directory to contain all output files. By default this folder will be created at the parent directory of the Scripts folder |
|
||||
|--xplate_contam |flag (true/false) | include or exclude | Run cross-plate contamination removal (includes all files) |
|
||||
| --genetic_code |string|A .txt or .tsv with two tab-separated columns, the first with the ten-digit codes and the second column with the corresponding genetics codes| If all of your taxa use the same genetic code, you may enter it here. Alternatively, if you need to use a variety of genetic codes but know which codes to use, you may fill give here the path to a file. |
|
||||
|--conspecific_names |string| A .txt or .tsv file with two tab-separated columns; the first should have 10 digit codes, the second species or other identifying names|This is used to determine which sequences to remove (only between "species") in index switching (cross-plate contamination) assessment. |
|
||||
| --minlen |integer| any positive integer| Minimum transcript length |
|
||||
| --maxlen |integer|any positive integer| Maximum transcript length |
|
||||
| --seq_count |integer|any positive integer| Minimum number of sequences after assigning GFs |
|
||||
|
||||
### Index Switching (Cross plate contamination)
|
||||
As you run PhyloToL part 1 on transcriptomes, you might want to remove sequences from your assembled transcripts that are a result of index switching. This is done by clustering all of your input assembled transcripts with Vsearch at a nucleotide identity of 99%. Sequences with less than one-tenth the k-mer coverage of the highest covered sequence in the cluster are removed, as long as both sequences are not 'conspecific' (usually, this means from the same species or genus). You can tell PhyloToL which of your taxa are conspecific by inputting a text file to the --conspecific_names argument with two tab-separated columns; the first column should be a ten-digit sample identifer and the second column a group (e.g., species, genus) identifier; samples with the same group identifier are taken to be conspecific.
|
||||
@ -130,12 +130,12 @@ The parameter options are:
|
||||
|
||||
| Parameter | Type| Options| Description|
|
||||
| ----------- | ----------------- |----------- | ----------------- |
|
||||
| --first_script| int | 1, 2, 3, 4 | First script to run |
|
||||
| --last_script | int | 2, 3, 4, 5 | First script to run|
|
||||
| --cds| str|Path to a folder of nucleotide CDS| Each file name should start with a unique 10 digit code, and end in "_GenBankCDS.fasta", E.g. Op_me_hsap_GenBankCDS.fasta|
|
||||
| --output|str|Path for the output files | An "Output" folder will be created at this directory to contain all output files. By default this folder will be created at the parent directory of the Scripts folder |
|
||||
| --genetic_code| str| Path to a file, Universal | If all of your taxa use the same genetic code, you may enter it here|
|
||||
| --databases| str| Path to databases folder | The folder should contain all 3 databases|
|
||||
| --first_script| integer | 1, 2, 3, 4 | First script to run |
|
||||
| --last_script | integer | 2, 3, 4, 5 | First script to run|
|
||||
| --cds| string|Path to a folder of nucleotide CDS| Each file name should start with a unique 10 digit code, and end in "_GenBankCDS.fasta", E.g. Op_me_hsap_GenBankCDS.fasta|
|
||||
| --output|string|Path for the output files | An "Output" folder will be created at this directory to contain all output files. By default this folder will be created at the parent directory of the Scripts folder |
|
||||
| --genetic_code| string| Path to a file, Universal | If all of your taxa use the same genetic code, you may enter it here|
|
||||
| --databases| string| Path to databases folder | The folder should contain all 3 databases|
|
||||
|
||||
## Output
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user