Shore count

From SHORE wiki
Revision as of 10:51, 28 September 2011 by Felo80 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

shore count

shore count calculates read counts, mean coverage as well as other quantitative properties for regions in the genome that have already been defined by some other means. It may be used to analyze either fixed-size jumping windows over the genome or regions defined in an input file, e.g. to analyze annotated coding regions or to manually re-analyze regions defined by the segmentation algorithms of shore coverage, shore peak or shore srna.

Accepted input files are tab-delimited plain text files with a header specifying the columns chr, pos, size and optionally strand.


Command line options

Usage: shore count [OPTIONS] [MAPFILES]

Mandatory
-m, --mapfiles=STRING[:...][,...] Alignment files or shore directories (flowcell, lane, pe or barcode; comma-separated; colon-separated items will be treated as single assay)
-o, --outdir=STRING (Default: SegmentAnalysis) Output directory, will be created
Variable size
-f, --segment-file=STRING Set file with segment information (either a sorted file with columns chr, pos, size, strand, or a GFF file)
Fixed size
-s, --segment-size=INT Use segments of fixed size <arg> instead of a file
-j, --segment-distance=INT Distance of fixed size segments (defaults to segment size)
-t, --strand-specific Count both strands separately
Output
-k, --rpkm Also calculate reads per kilobase & million (RPKM) values (totals calculated without applying the alignment filter)
-a, --fasta-file=STRING If a fasta file is provided, the sequence will be reported for each segment
Counting
-O, --overlap=FLOAT[%] (Default: 50%) Required amount of overlap between read and feature (percentage or absolute)
-W, --weight-repetitive=STRING (Default: divide) How to weight repetitive hits (divide or multiply or const)
Alignment filter
-H, --hits-range=INT,INT Set the allowed range of repetitiveness ('1,1' = nonrep reads)
-M, --mm-range=INT,INT Set the allowed range of mismatches
-R, --region=STRING Only use reads that overlap with the range [chr1:pos1..[chr2:]pos2]
--assume-length=INT (Default: 400) Assume maximal alignment length <arg>, enables fast range queries
-X, --p3fix=INT Set the 3' end to a fixed distance from the 5' end
-N, --read-lengths=INT[,...] Use only reads of the given length(s)
-T, --strand=STRING Use only reads from the given strand
-B, --duplicates=FLOAT Report at maximum <arg> reads with the 5' end at the same position on the same strand
--wpoiss=INT Window size for adaptive duplicate read filtering
--sam-ref=STRING Reference sequence for SAM file parsing
--peflags=INT[,...] Use only reads with the given PE flag(s)