Shore consensus
shore consensus is being replaced by shore qvar (but still a requirement for SHOREmap analysis).
The common output from whole genome re-sequencing projects are lists of all identified polymorphisms (e.g. SNPs, indels, CNVs) as well as reference-like positions. In addition a consensus sequence or contigs can be generated by combining all high quality predictions. shore consensus provides this functionality by sequentially scanning an alignment to gather all read information available at a specific locus (i.e. called bases, base qualities, coverage, repetitiveness, alignment quality). This information is subsequently used to predict differences to the reference sequence.
shore consensus can also be used to identify minor alleles (SNPs or short indels) in pooled samples. In addition shore consensus estimates several characteristics of a run ahead of the actual consensus calling. This includes min and max read length, min and max mismatches, sequencing depth, observed local repetitiveness and GC content bias. Consensus also provides multiple project statistics regarding sequencing error rate, correlation of quality values to observed errors and coverage biases due to local GC content, which can be used to optimize further analysis (e.g. deletions should not be called in low GC content regions if a strong GC bias is observed).
Note: shore consensus can also be applied to sRNA-seq, mRNA-seq and ChIP-seq data. However, SHORE provides more appropriate tools for those purposes (coverage and peak).
The output generated by shore consensus is described in SHORE consensus result files.
Usage: shore consensus [OPTIONS]
Mandatory | ||
-n STRING | Name (any of species, strain, accession, project or any other ID) | |
-f STRING | Reference genome sequence from the IndexFolder, *.shore file | |
-o STRING | AnalysisFolder, will be created | |
-i STRING[,...] | Shore directories or map.list file(s) | |
-g INT | Core offset - do not trust the first and last -g positions of the alignment. default: max MM's | |
Quality threshold | ||
-q INT | (Default: 5) | Cutoff for base masking using Sanger calibrated qualities |
-c INT | Cutoff for base masking using chastity values | |
Basecalling (scoring matrix approach) | ||
-a STRING | Scoring matrix file (recommended, activates new basecalling approach) | |
-b FLOAT | (Default: 0.2) | Minimum allele frequency of alternative base call |
Basecalling (decision tree approach) | ||
-x INT | (Default: 3) | Minimum coverage threshold |
-m INT | (Default: 3) | Maximum observed to expected coverage |
-e FLOAT | (Default: 0.1) | Minimum observed to expected coverage |
-y FLOAT | (Default: 0.8) | Minimum concordance of homozygous SNPs (0 to 1) |
-d FLOAT | (Default: 0.67) | Minimum concordance of homozygous Indels |
-t FLOAT | (Default: 0.25) | Minimum frequency for heterozygous pos (0 to 1) |
-u FLOAT | (Default: 0.02) | Minimum frequency for minor allele pos (0 to 1) |
-z INT | (Default: 10) | Quality threshold, max base quality |
Optional | ||
-R INT | Allow base calling in highly repetitive regions | |
-s INT | Consensus analysis using transcriptome (mRNA-seq) reads. Turns off CNV analysis | |
-S INT | (Default: 0) | Ignore position with transcriptome coverage not above threshold |
-w INT | Use graph based map.list format (only genomemapper) | |
-v INT | Create additional output files containing all intermediate data (required for subsequent SHOREmap analysis) | |
-r INT | Graphical output of statistics using R | |
-N INT | Turn off calculation of long deletions, duplications and any other CNVs |