Difference between revisions of "Shore peak"
From SHORE wiki
(→SHORE peak result files) |
|||
Line 97: | Line 97: | ||
==SHORE peak result files== | ==SHORE peak result files== | ||
− | The main result file produced by '' | + | The main result file produced by ''shore peak'' is named ''SUMMARY.txt'': |
{| | {| |
Revision as of 15:06, 23 September 2011
shore peak provides enriched region prediction for ChIP-Seq experiments. Significance of the predicted regions is assessed by comparison to the specified control samples.
Replicate experiments may be processed simultaneously by specifying multiple experiment and control paths. While the significance of each peak region is then tested for independently for each replicate, the region prediction itself is performed jointly for all experiments to obtain results that are immediately comparable.
The output generated by shore peak is described in SHORE peak result files.
Command line options
Usage: shore peak [OPTIONS]
Mandatory | ||
-o, --outfolder=<arg> | (Default: PeakAnalysis) | Output directory (will be created) |
-i, --chip-paths=<arg[:...][,...]> | ChIP experiment alignment files or shore directories (replicates) | |
-c, --ctrl-paths=<arg[:...][,...]> | Control experiment alignment files or shore directories | |
Segmentation | ||
-S, --window-size=<arg> | (Default: 2000) | Sliding window size for dynamic segmentation. Note that this value presents an upper bound for the size of the peaks that can be detected. |
-P, --poisson-threshold=<arg> | (Default: 0.05) | Poisson probability threshold [<=] for dynamic segmentation |
-V, --probation=<arg> | (Default: 0) | Allow a mitigated threshold for at most <arg> base pairs inside a segment |
-Q, --mitigator=<arg> | (Default: 1) | Modifier for calculation of the mitigated threshold, value in [0,1] |
-J, --minsize=<arg> | (Default: 131) | Segment size threshold [>=] |
Normalization | ||
-b, --binsize=<arg> | (Default: 4000) | Size of read bins for normalization |
-q, --rankmaxquant-ubound=<arg> | (Default: 1) | Quantile upper bound for the rank maxima of the bins used |
Read filter | ||
-H, --hits-range=<arg,arg> | Set the allowed range of repetitiveness ('1,1' = nonrep reads) | |
-M, --mm-range=<arg,arg> | Set the allowed range of mismatches | |
-R, --region=<arg> | Only use reads that overlap with the range [chr1:pos1..[chr2:]pos2] | |
--assume-length=<arg> | (Default: 400) | Assume maximal alignment length <arg>, enables fast range queries |
-X, --p3fix=<arg> | (Default: 130) | Set the 3' end to a fixed distance from the 5' end (set to 0 to disable) |
-N, --read-lengths=<arg[,...]> | Use only reads of the given length(s) | |
-B, --duplicates=<arg> | Report at maximum <arg> reads with the 5' end at the same position on the same strand | |
--sam-ref=<arg> | Reference sequence for SAM file parsing | |
--peflags=<arg[,...]> | Use only reads with the given PE flag(s) | |
-F, --poissonifier-width=<arg> | (Default: 13) | Set the window size for the adaptive duplicate read filter (set to zero to disable) |
Peak filtering | ||
-n, --nsigma=<arg> | (Default: 6) | Allow the mean segment coverage any control sample to be at most <arg> std. deviations higher than the median before discarding the segment |
--min-xshift=<arg> | (Default: 10) | Require a certain shift for the reverse strand peak in at least one experiment |
--min-foldchange=<arg> | (Default: 2) | Require a minimum normalized fold change of <arg> for experiment vs. control for at least one experiment |
Other | ||
--non-directional | Assume that any clone may be sequenced from both ends (calculates a more conservative FDR) | |
-d, --rankproduct=<arg> | (Default: 10000) | Number of simulations for rankproduct PFP estimation (set to zero to disable PFP estimation) |
--rplot=<arg> | (Default: 100) | Plot the first <arg> peaks using R |
-r, --index-file=<arg> | Extract sequence information for each segment from *.shore index file | |
-a, --annotation-file=<arg> | Annotation file, GFF3 file in sequence ontology compliant format | |
-O, --chr-ordering=<arg[,...]> | Allows to specify the order of chromosome entries in the annotation file | |
--so-filter=<arg[,...]> | (Default: gene,transposable_element_gene) | Only parse toplevel annotation features of the given SO types |
SHORE peak result files
The main result file produced by shore peak is named SUMMARY.txt:
id | An arbitrary numerical ID for the peak region |
chr | Sequence / chromosome ID |
pos | Left-most position of the peak region on the reference sequence |
size | Size of the peak region |
p_rank1 | Replicate 1 rank of the P-value of the peak |
fdr_bh_q1 | Replicate 1 Benjamini-Hochberg adjusted FDR of the peak |
rc_chip1 | Replicate 1 number of reads contributing to the peak in the sample |
rc_ctrl1 | Replicate 1 number of reads in the same region of the control |
pbexcess1 | Replicate 1 per-base-excess: mean_coverage_sample(peak) - (mean_coverage_control(peak) * normalization_constant) |
fc_score1 | Replicate 1 fold change score: 4 * atan(mean_coverage_sample(peak) / mean_coverage_control(peak) * normalization_constant) / PI - 1.0 |
height_excess1 | Replicate 1 peak height excess: |
frfc_score1 | Replicate 1 forward-reverse fold change score: Calculated like fc_score, but compares the sample forward strand and reverse strand coverage |
cog_xshift1 | Replicate 1 forward-reverse peak shift |
overlap_names | Identifiers of the genes overlapping with the center of the peak region (only preset when the option -a was specified) |
overlap_types | Parts of the genes that overlap (exon, 5' UTR etc.) (only present when the option -a was specified) |
up_names | Identifiers of the closest genes 'to the left' from the center of the peak (only present when the option -a was specified) |
up_dist | Distance of the closest genes 'to the left' from the center of the peak (only present when the option -a was specified) |
up_strands | Strands of the closest genes 'to the left' from the center of the peak (only present when the option -a was specified) |
down_names | Identifiers of the closest genes 'to the right' from the center of the peak (only present when the option -a was specified) |
down_dist | Distance of the closest genes 'to the right' from the center of the peak (only present when the option -a was specified) |
down_strands | Strands of the closest genes 'to the right' from the center of the peak (only present when the option -a was specified) |