Difference between revisions of "Shore peak"
From SHORE wiki
(→Command line options) |
(→SHORE peak result files) |
||
Line 107: | Line 107: | ||
|- | |- | ||
|'''size'''||Size of the peak region | |'''size'''||Size of the peak region | ||
+ | |- | ||
+ | | '''orp'''||Observed rank product (only present for multiple replicates) | ||
+ | |- | ||
+ | | '''orp_rank'''||Rank of the observed rank product (only present for multiple replicates) | ||
|- | |- | ||
|'''p_rank1'''||Replicate 1 rank of the P-value of the peak | |'''p_rank1'''||Replicate 1 rank of the P-value of the peak |
Revision as of 16:39, 30 November 2011
shore peak provides enriched region prediction for ChIP-Seq experiments. Significance of the predicted regions is assessed by comparison to the specified control samples.
Replicate experiments may be processed simultaneously by specifying multiple experiment and control paths. While the significance of each peak region is then tested for independently for each replicate, the region prediction itself is performed jointly for all experiments to obtain results that are immediately comparable.
The output generated by shore peak is described below.
Command line options
Usage: shore peak [OPTIONS]
Mandatory | ||
-o, --outdir=STRING | (Default: PeakAnalysis) | Output directory (will be created) |
-i, --chip-paths=STRING[:...][,...] | ChIP experiment alignment files or shore directories (replicates) | |
-c, --ctrl-paths=STRING[:...][,...] | Control experiment alignment files or shore directories | |
Segmentation | ||
-S, --window-size=INT | (Default: 2000) | Sliding window size for dynamic segmentation |
-P, --poisson-threshold=FLOAT | (Default: 0.05) | Poisson probability threshold [<=] for dynamic segmentation |
-V, --probation=INT | (Default: 0) | Allow a mitigated threshold for at most <arg> base pairs inside a segment |
-Q, --mitigator=FLOAT | (Default: 1) | Modifier for calculation of the mitigated threshold, value in [0,1] |
-J, --minsize=INT | (Default: 131) | Segment size threshold [>=] |
Normalization | ||
-b, --binsize=INT | (Default: 4000) | Size of read bins for normalization |
-q, --rankmaxquant-ubound=FLOAT | (Default: 1) | Quantile upper bound for the rank maxima of the bins used |
Read filter | ||
-H, --hits-range=INT,INT | Set the allowed range of repetitiveness ('1,1' = nonrep reads) | |
-M, --mm-range=INT,INT | Set the allowed range of mismatches | |
-R, --region=STRING | Only use reads that overlap with the range [chr1:pos1..[chr2:]pos2]
Prior range indexing the alignment files using shore 2dex is recommended | |
--assume-length=INT | (Default: 400) | Assume maximal alignment length <arg>, enables fast range queries |
-X, --p3fix=INT | (Default: 130) | Set the 3' end to a fixed distance from the 5' end (set to 0 to disable) |
-N, --read-lengths=INT[,...] | Use only reads of the given length(s) | |
-B, --duplicates=FLOAT | Report at maximum <arg> reads with the 5' end at the same position on the same strand | |
--sam-ref=STRING | Reference sequence for SAM file parsing | |
--peflags=INT[,...] | Use only reads with the given PE flag(s) | |
-F, --poissonifier-width=INT | (Default: 13) | Set the window size for the adaptive duplicate read filter (set to zero to disable) |
Peak filtering | ||
-n, --nsigma=FLOAT | (Default: 6) | Allow the mean segment coverage any control sample to be at most <arg> std. deviations higher than the median before discarding the segment |
--min-xshift=FLOAT | (Default: 10) | Require a certain shift for the reverse strand peak in at least one experiment |
--min-foldchange=FLOAT | (Default: 2) | Require a minimum normalized fold change of <arg> for experiment vs. control for at least one experiment |
Other | ||
--non-directional | Assume that any clone may be sequenced from both ends (calculates a more conservative FDR) | |
-d, --rankproduct=INT | (Default: 10000) | Number of simulations for rankproduct PFP estimation (set to zero to disable PFP estimation) |
--rplot=INT | (Default: 100) | Plot the first <arg> peaks using R; alignment files must be indexed using shore 2dex |
-r, --index-file=STRING | Extract sequence information for each segment from *.shore index file | |
-a, --annotation-file=STRING | Annotation file (sequenceontology.org GFF3 format; numerical sequence IDs required) | |
--so-filter=STRING[,...] | (Default: gene,transposable_element_gene) | Only parse toplevel annotation features of the given SO types |
SHORE peak result files
The main result file produced by shore peak is named SUMMARY.txt:
id | An arbitrary numerical ID for the peak region |
chr | Sequence / chromosome ID |
pos | Left-most position of the peak region on the reference sequence |
size | Size of the peak region |
orp | Observed rank product (only present for multiple replicates) |
orp_rank | Rank of the observed rank product (only present for multiple replicates) |
p_rank1 | Replicate 1 rank of the P-value of the peak |
fdr_bh_q1 | Replicate 1 Benjamini-Hochberg adjusted FDR of the peak |
rc_chip1 | Replicate 1 number of reads contributing to the peak in the sample |
rc_ctrl1 | Replicate 1 number of reads in the same region of the control |
pbexcess1 | Replicate 1 per-base-excess: mean_coverage_sample(peak) - (mean_coverage_control(peak) * normalization_constant) |
fc_score1 | Replicate 1 fold change score: 4 * atan(mean_coverage_sample(peak) / mean_coverage_control(peak) * normalization_constant) / PI - 1.0 |
height_excess1 | Replicate 1 peak height excess: |
frfc_score1 | Replicate 1 forward-reverse fold change score: Calculated like fc_score, but compares the sample forward strand and reverse strand coverage |
cog_xshift1 | Replicate 1 forward-reverse peak shift |
overlap_names | Identifiers of the genes overlapping with the center of the peak region (only preset when the option -a was specified) |
overlap_types | Parts of the genes that overlap (exon, 5' UTR etc.) (only present when the option -a was specified) |
up_names | Identifiers of the closest genes 'to the left' from the center of the peak (only present when the option -a was specified) |
up_dist | Distance of the closest genes 'to the left' from the center of the peak (only present when the option -a was specified) |
up_strands | Strands of the closest genes 'to the left' from the center of the peak (only present when the option -a was specified) |
down_names | Identifiers of the closest genes 'to the right' from the center of the peak (only present when the option -a was specified) |
down_dist | Distance of the closest genes 'to the right' from the center of the peak (only present when the option -a was specified) |
down_strands | Strands of the closest genes 'to the right' from the center of the peak (only present when the option -a was specified) |