From 2e507ecf46a6ae84527c2c0620a52b013492e4b0 Mon Sep 17 00:00:00 2001 From: Godwin Ani Date: Mon, 12 Aug 2024 15:00:14 -0400 Subject: [PATCH] Updated PhyloToL Part 1: GF assignment (markdown) --- PhyloToL-Part-1:-GF-assignment.md | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/PhyloToL-Part-1:-GF-assignment.md b/PhyloToL-Part-1:-GF-assignment.md index 9e9b15f..528f550 100644 --- a/PhyloToL-Part-1:-GF-assignment.md +++ b/PhyloToL-Part-1:-GF-assignment.md @@ -96,12 +96,23 @@ Role of each script To process transcriptomes, run: `python Scripts/wrapper.py -1 1 -2 7 --assembled_transcripts AssembledTranscripts --output . --genetic_code Universal -d Databases > log.txt` -* -1 = start script -* -2 = end script -* --assembled_transcripts = Folder with Assembled transcripts in fasta format -* --output = path to output folder -* --genetic_code = specified genetic code, name of .txt file with Genetic codes -* -d = path to Databases folder + +| Parameter | Description| +| ----------- | ----------------- | +| -1, --first_script | First script to run | +| -2, --last_script | Last script to run | +| -a, --assembled_transcripts | Path to a folder of assembled transcripts, assembled by rnaSPAdes. Each assembled transcript file name should start with a unique 10 digit code, and end in "_assembledTranscripts.fasta", E.g. Op_me_hsap_assembledTranscripts.fasta | +| -d, --databases | Path to databases folder | +| -o, --output | An "Output" folder will be created at this directory to contain all output files. By default this folder will be created at the parent directory of the Scripts folder | +| -x, --xplate_contam | Run cross-plate contamination removal (includes all files) | +| -g, --genetic_code | If all of your taxa use the same genetic code, you may enter it here (to be used in script 5). Alternatively, if you need to use a variety of genetic codes but know which codes to use, you may fill give here the path to a .txt or .tsv with two tab-separated columns, the first with the ten-digit codes and the second column with the corresponding genetics codes | +| -n, --conspecific_names | A .txt or .tsv file with two tab-separated columns; the first should have 10 digit codes, the second species or other identifying names. This is used to determine which sequences to remove (only between "species") in cross-plate contamination assessment. | +| -min, --minlen | Minimum transcript length | +| -max, --maxlen | Maximum transcript length | +| -c, --seq_count | minimum number of sequences after assigning OGs | + + + * \>log.txt = if added to the end of the command, it will output a log file with progress, warning, or error messages * *For running with cross plate contamination removal, add `-x -n Conspecific.txt` to the line of code.