Shore consensus

shore consensus is being replaced by shore qvar (but still a requirement for SHOREmap analysis).

The common output from whole genome re-sequencing projects are lists of all identified polymorphisms (e.g. SNPs, indels, CNVs) as well as reference-like positions. In addition a consensus sequence or contigs can be generated by combining all high quality predictions. shore consensus provides this functionality by sequentially scanning an alignment to gather all read information available at a specific locus (i.e. called bases, base qualities, coverage, repetitiveness, alignment quality). This information is subsequently used to predict differences to the reference sequence.

shore consensus can also be used to identify minor alleles (SNPs or short indels) in pooled samples. In addition shore consensus estimates several characteristics of a run ahead of the actual consensus calling. This includes min and max read length, min and max mismatches, sequencing depth, observed local repetitiveness and GC content bias. Consensus also provides multiple project statistics regarding sequencing error rate, correlation of quality values to observed errors and coverage biases due to local GC content, which can be used to optimize further analysis (e.g. deletions should not be called in low GC content regions if a strong GC bias is observed).

Note: shore consensus can also be applied to sRNA-seq, mRNA-seq and ChIP-seq data. However, SHORE provides more appropriate tools for those purposes (coverage and peak).

The output generated by shore consensus is described in SHORE consensus result files.

Usage: shore consensus [OPTIONS]

Mandatory
-n STRING		Name (any of species, strain, accession, project or any other ID)
-f STRING		Reference genome sequence from the IndexFolder, *.shore file
-o STRING		AnalysisFolder, will be created
-i STRING[,...]		Shore directories or map.list file(s)
-g INT		Core offset - do not trust the first and last -g positions of the alignment. default: max MM's
Quality threshold
-q INT	(Default: 5)	Cutoff for base masking using Sanger calibrated qualities
-c INT		Cutoff for base masking using chastity values
Basecalling (scoring matrix approach)
-a STRING		Scoring matrix file (recommended, activates new basecalling approach)
-b FLOAT	(Default: 0.2)	Minimum allele frequency of alternative base call
Basecalling (decision tree approach)
-x INT	(Default: 3)	Minimum coverage threshold
-m INT	(Default: 3)	Maximum observed to expected coverage
-e FLOAT	(Default: 0.1)	Minimum observed to expected coverage
-y FLOAT	(Default: 0.8)	Minimum concordance of homozygous SNPs (0 to 1)
-d FLOAT	(Default: 0.67)	Minimum concordance of homozygous Indels
-t FLOAT	(Default: 0.25)	Minimum frequency for heterozygous pos (0 to 1)
-u FLOAT	(Default: 0.02)	Minimum frequency for minor allele pos (0 to 1)
-z INT	(Default: 10)	Quality threshold, max base quality
Optional
-R INT		Allow base calling in highly repetitive regions
-s INT		Consensus analysis using transcriptome (mRNA-seq) reads. Turns off CNV analysis
-S INT	(Default: 0)	Ignore position with transcriptome coverage not above threshold
-w INT		Use graph based map.list format (only genomemapper)
-v INT		Create additional output files containing all intermediate data (required for subsequent SHOREmap analysis)
-r INT		Graphical output of statistics using R
-N INT		Turn off calculation of long deletions, duplications and any other CNVs

Shore consensus

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools