Shore peak

From SHORE wiki
Jump to: navigation, search

shore peak provides enriched region prediction for ChIP-Seq experiments. Significance of the predicted regions is assessed by comparison to the specified control samples.

Replicate experiments may be processed simultaneously by specifying multiple experiment and control paths. While the significance of each peak region is then tested for independently for each replicate, the region prediction itself is performed jointly for all experiments to obtain results that are immediately comparable.

The output generated by shore peak is described below.

Command line options

Usage: shore peak [OPTIONS]

Mandatory
-o, --outdir=STRING (Default: PeakAnalysis) Output directory (will be created)
-i, --chip-paths=STRING[:...][,...] ChIP experiment alignment files or shore directories (replicates)
-c, --ctrl-paths=STRING[:...][,...] Control experiment alignment files or shore directories
Segmentation
-S, --window-size=INT (Default: 2000) Sliding window size for dynamic segmentation
-P, --poisson-threshold=FLOAT (Default: 0.05) Poisson probability threshold [<=] for dynamic segmentation
-V, --probation=INT (Default: 0) Allow a mitigated threshold for at most <arg> base pairs inside a segment
-Q, --mitigator=FLOAT (Default: 1) Modifier for calculation of the mitigated threshold, value in [0,1]
-J, --minsize=INT (Default: 131) Segment size threshold [>=]
Normalization
-b, --binsize=INT (Default: 4000) Size of read bins for normalization
-q, --rankmaxquant-ubound=FLOAT (Default: 1) Quantile upper bound for the rank maxima of the bins used
Read filter
-H, --hits-range=INT,INT Set the allowed range of repetitiveness ('1,1' = nonrep reads)
-M, --mm-range=INT,INT Set the allowed range of mismatches
-R, --region=STRING Only use reads that overlap with the range [chr1:pos1..[chr2:]pos2]

Prior range indexing the alignment files using shore 2dex is recommended

--assume-length=INT (Default: 400) Assume maximal alignment length <arg>, enables fast range queries
-X, --p3fix=INT (Default: 130) Set the 3' end to a fixed distance from the 5' end (set to 0 to disable)
-N, --read-lengths=INT[,...] Use only reads of the given length(s)
-B, --duplicates=FLOAT Report at maximum <arg> reads with the 5' end at the same position on the same strand
--sam-ref=STRING Reference sequence for SAM file parsing
--peflags=INT[,...] Use only reads with the given PE flag(s)
-F, --poissonifier-width=INT (Default: 13) Set the window size for the adaptive duplicate read filter (set to zero to disable)
Peak filtering
-n, --nsigma=FLOAT (Default: 6) Allow the mean segment coverage any control sample to be at most <arg> std. deviations higher than the median before discarding the segment
--min-xshift=FLOAT (Default: 10) Require a certain shift for the reverse strand peak in at least one experiment
--min-foldchange=FLOAT (Default: 2) Require a minimum normalized fold change of <arg> for experiment vs. control for at least one experiment
Other
--non-directional Assume that any clone may be sequenced from both ends (calculates a more conservative FDR)
-d, --rankproduct=INT (Default: 10000) Number of simulations for rankproduct PFP estimation (set to zero to disable PFP estimation)
--rplot=INT (Default: 100) Plot the first <arg> peaks using R; alignment files must be indexed using shore 2dex
-r, --index-file=STRING Extract sequence information for each segment from *.shore index file
-a, --annotation-file=STRING Annotation file (sequenceontology.org GFF3 format; numerical sequence IDs required)
--so-filter=STRING[,...] (Default: gene,transposable_element_gene) Only parse toplevel annotation features of the given SO types

SHORE peak result files

The main result file produced by shore peak is named SUMMARY.txt:

id An arbitrary numerical ID for the peak region
chr Sequence / chromosome ID
pos Left-most position of the peak region on the reference sequence
size Size of the peak region
log2_orp Observed rank product, base 2 logarithm (only present for multiple replicates)
orp_rank Rank of the observed rank product (only present for multiple replicates)
p_rank1 Replicate 1 rank of the P-value of the peak
fdr_bh_q1 Replicate 1 Benjamini-Hochberg adjusted FDR of the peak
rc_chip1 Replicate 1 number of reads contributing to the peak in the sample
rc_ctrl1 Replicate 1 number of reads in the same region of the control
pbexcess1 Replicate 1 per-base-excess: mean_coverage_sample(peak) - (mean_coverage_control(peak) * normalization_constant)
fc_score1 Replicate 1 fold change score: 4 * atan(mean_coverage_sample(peak) / mean_coverage_control(peak) * normalization_constant) / PI - 1.0
height_excess1 Replicate 1 peak height excess:
frfc_score1 Replicate 1 forward-reverse fold change score: Calculated like fc_score, but compares the sample forward strand and reverse strand coverage
cog_xshift1 Replicate 1 forward-reverse peak shift
overlap_names Identifiers of the genes overlapping with the center of the peak region (only preset when the option -a was specified)
overlap_types Parts of the genes that overlap (exon, 5' UTR etc.) (only present when the option -a was specified)
up_names Identifiers of the closest genes 'to the left' from the center of the peak (only present when the option -a was specified)
up_dist Distance of the closest genes 'to the left' from the center of the peak (only present when the option -a was specified)
up_strands Strands of the closest genes 'to the left' from the center of the peak (only present when the option -a was specified)
down_names Identifiers of the closest genes 'to the right' from the center of the peak (only present when the option -a was specified)
down_dist Distance of the closest genes 'to the right' from the center of the peak (only present when the option -a was specified)
down_strands Strands of the closest genes 'to the right' from the center of the peak (only present when the option -a was specified)