Difference between revisions of "Shore srna"

Revision as of 16:43, 26 September 2011

The purpose of shore srna is facilitating the analysis of small RNA sequencing data. The genome is scanned for regions where significant amounts of small RNAs are expressed and annotates these loci by read counts as well as the sRNA size that predominates.

Note: The following applies to SHORE v0.7

Command line options

Usage: shore srna [OPTIONS] [SAMPLE_PATHS]

Input
-s, --samples=STRING[:...][,...]		Shore folders (comma-separated; colon-separated items will be treated as a single assay)
Output
-o, --outdir=STRING	(Default: SrnaAnalysis)	Output directory (will be created)
--rpkm		Report counts normalized as 'reads per kilobase per million' instead of 'reads per million'
Coverage
-W, --weight-repetitive=STRING	(Default: divide)	How to weight repetitive hits (divide or multiply or const) divide: each alignment has a score of 1/number_of_hits multiply: each alignment has a score of number_of_hits (only useful for repeat analysis, don't use) const: each alignment is counted as 1
Segmentation
-j, --joint-seg		Apply segmentation threshold to the joint coverage instead of per-sample coverage
-C, --static-threshold=FLOAT	(Default: 10)	Coverage threshold [>]
-J, --minsize=INT	(Default: 15)	Segment size threshold [>=]
-V, --probation=INT	(Default: 0)	Allow a mitigated threshold for at most <arg> base pairs inside a segment
-Q, --mitigator=FLOAT	(Default: 1)	Modifier for calculation of the mitigated threshold, value in [0,1]
-v, --overlap=INT	(Default: 1)	Required overlap for merging segments (may be negative to allow gaps)
Alignment filter
-H, --hits-range=INT,INT		Set the allowed range of repetitiveness ('1,1' = nonrep reads)
-M, --mm-range=INT,INT		Set the allowed range of mismatches
-R, --region=STRING		Only use reads that overlap with the range [chr1:pos1..[chr2:]pos2]
--assume-length=INT	(Default: 400)	Assume maximal alignment length <arg>, enables fast range queries
-N, --read-lengths=INT[,...]		Use only reads of the given length(s)
-B, --duplicates=FLOAT		Report at maximum <arg> reads with the 5' end at the same position on the same strand
--sam-ref=STRING		Reference sequence for SAM file parsing

SHORE srna result files

The main result file produced by shore srna is named seg.txt:

chr	sequence / chromosome ID
pos	left-most position of the expressed locus on the reference sequence
size	size of the expressed locus
strand	strand of the expressed locus; each strand is processed completely independently
kmer_maxofs	offset into the locus where the kmer with size lmax is most strongly expressed; useful for locating the exact position of mature miRNA.
agree	fraction of samples where lmax is the size of the most frequent kmer at the locus
disagree	fraction of samples where lmax is not the size of the most frequent kmer at the locus
lmax	the most frequent kmer (calculated from the RPM-normalized read counts) at the locus across all samples
cmax	RP[K]M-normalized count of the most frequent kmer at this locus across all samples
ctotal	RP[K]M-normalized total read count across all samples at this locus
cpure	kmer "purity": cmax/ctotal
cchas	kmer "chastity": cmax/(cmax+cmax2), where cmax2 is the normalized read count of the 2nd-most frequent kmer at the locus; cchas is always >=0.5
kmers:<sample 1>	raw read count for the kmer with length lmax for each sample; calculated according to option --weight-repetitive
...
kmers:<sample N>
<sample 1>	total raw read count for each sample; calculated according to option --weight-repetitive
...
<sample N>
[RPM]kmers:<sample 1>	normalized read count for the kmer with length lmax for each sample; calculated according to options --weight-repetitive and --rpkm
...
[RPM]kmers:<sample N>
[RPM]<sample 1>	total normalized read count for each sample; calculated according to options --weight-repetitive and --rpkm
...
[RPM]<sample N>

Difference between revisions of "Shore srna"

Revision as of 16:43, 26 September 2011

Command line options

SHORE srna result files

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools

@@ Line 3: / Line 3: @@
 amounts of small RNAs are expressed and annotates these loci by read counts as well as the
 sRNA size that predominates.
+ Note: The following applies to SHORE v0.7
 ==Command line options==