Shore srna
From SHORE wiki
The purpose of shore srna is facilitating the analysis of small RNA sequencing data. The genome is scanned for regions where significant amounts of small RNAs are expressed and annotates these loci by read counts as well as the sRNA size that predominates.
Note: The following applies to SHORE v0.7
Command line options
Usage: shore srna [OPTIONS] [SAMPLE_PATHS]
Input | ||
-s, --samples=STRING[:...][,...] | Shore directories (comma-separated; colon-separated items will be treated as a single assay) | |
Output | ||
-o, --outdir=STRING | (Default: SrnaAnalysis) | Output directory (will be created) |
--rpkm | Report counts normalized as reads per kilobase per million instead of the default reads per million | |
Coverage | ||
-W, --weight-repetitive=STRING | (Default: divide) | How to weight repetitive hits (divide or multiply or const)
|
Segmentation | ||
-j, --joint-seg | Apply segmentation threshold to the joint coverage instead of per-sample coverage | |
-C, --static-threshold=FLOAT | (Default: 10) | Coverage threshold [>] |
-J, --minsize=INT | (Default: 15) | Segment size threshold [>=] |
-V, --probation=INT | (Default: 0) | Allow a mitigated threshold for at most <arg> base pairs inside a segment |
-Q, --mitigator=FLOAT | (Default: 1) | Modifier for calculation of the mitigated threshold, value in [0,1] |
-v, --overlap=INT | (Default: 1) | Required overlap for merging segments (may be negative to allow gaps) |
Alignment filter | ||
-H, --hits-range=INT,INT | Set the allowed range of repetitiveness ('1,1' = nonrep reads) | |
-M, --mm-range=INT,INT | Set the allowed range of mismatches | |
-R, --region=STRING | Only use reads that overlap with the range [chr1:pos1..[chr2:]pos2] | |
--assume-length=INT | (Default: 400) | Assume maximal alignment length <arg>, enables fast range queries |
-N, --read-lengths=INT[,...] | Use only reads of the given length(s) | |
-B, --duplicates=FLOAT | Report at maximum <arg> reads with the 5' end at the same position on the same strand | |
--sam-ref=STRING | Reference sequence for SAM file parsing |
SHORE srna result files
The main result file produced by shore srna is named seg.txt:
chr | sequence / chromosome ID |
pos | left-most position of the expressed locus on the reference sequence |
size | size of the expressed locus |
strand | strand of the expressed locus; each strand is processed completely independently |
kmer_maxofs | offset into the locus where the kmer with size lmax is most strongly expressed; useful for locating the exact position of mature miRNA. |
agree | fraction of samples where lmax is the size of the most frequent kmer at the locus |
disagree | fraction of samples where lmax is not the size of the most frequent kmer at the locus |
lmax | the most frequent kmer (calculated from the RPM-normalized read counts) at the locus across all samples |
cmax | RP[K]M-normalized count of the most frequent kmer at this locus across all samples |
ctotal | RP[K]M-normalized total read count across all samples at this locus |
cpure | kmer "purity": cmax/ctotal |
cchas | kmer "chastity": cmax/(cmax+cmax2), where cmax2 is the normalized read count of the 2nd-most frequent kmer at the locus; cchas is always >=0.5 |
kmers:<sample 1>
... kmers:<sample N> |
raw read count for the kmer with length lmax for each sample; calculated according to option --weight-repetitive |
<sample 1>
... <sample N> |
total raw read count for each sample; calculated according to option --weight-repetitive |
[RPM]kmers:<sample 1>
... [RPM]kmers:<sample N> |
normalized read count for the kmer with length lmax for each sample; calculated according to options --weight-repetitive and --rpkm |
[RPM]<sample 1>
... [RPM]<sample N> |
total normalized read count for each sample; calculated according to options --weight-repetitive and --rpkm |