shore srna

The purpose of shore srna is facilitating the analysis of small RNA sequencing data. The genome is scanned for regions where significant amounts of small RNAs are expressed and annotates these loci by read counts as well as the sRNA size that predominates.

Usage: shore srna [OPTIONS] [SAMPLE_PATHS]

Mandatory
-s, --samples=<arg[:...][,...]>		Shore directories (comma-separated; colon-separated items will be treated as a single assay)
-o, --outfolder=<arg>	(Default: SrnaAnalysis)	Output directory
Segmentation
-j, --joint-seg		Apply segmentation threshold to the joint coverage instead of per-sample coverage
-C, --static-threshold=<arg>	(Default: 10)	Coverage threshold [>]
-J, --minsize=<arg>	(Default: 15)	Segment size threshold [>=]
-V, --probation=<arg>	(Default: 0)	Allow a mitigated threshold for at most <arg> base pairs inside a segment
-Q, --mitigator=<arg>	(Default: 1)	Modifier for calculation of the mitigated threshold, value in [0,1]
-v, --overlap=<arg>	(Default: 1)	Required overlap for merging segments (may be negative to allow gaps)
Alignment filter
-H, --hits-range=<arg,arg>		Set the allowed range of repetitiveness ('1,1' = nonrep reads)
-M, --mm-range=<arg,arg>		Set the allowed range of mismatches
-R, --region=<arg>		Only use reads that overlap with the range [chr1:pos1..[chr2:]pos2]
--assume-length=<arg>	(Default: 400)	Assume maximal alignment length <arg>, enables fast range queries
-B, --duplicates=<arg>		Report at maximum <arg> reads with the 5' end at the same position on the same strand
--sam-ref=<arg>		Reference sequence for SAM file parsing
--peflags=<arg[,...]>		Use only reads with the given PE flag(s)

shore coverage

For analysis of expression levels of mRNAs and small RNAs or for detection of unknown transcripts it is typically required to generate a coverage graph and to define expressed segments based on consecutive coverage.

shore coverage generates a coverage graph by sequentially scanning the alignment and basically counting reads.

Usage: shore coverage [OPTIONS] [MAPFILES]

Input
-m, --mapfiles=<arg[:...][,...]>		Alignment files or shore directories (flowcell, lane, pe or barcode; comma-separated; colon-separated items will be treated as single assay)
-n, --merge-input		Merge all input files
Output
-o, --output-directory=<arg>	(Default: CoverageAnalysis)	Output directory (will be created)
-s, --segmentation		Write segmentation files
-t, --merge-segments=<arg>		Overlap in base pairs for merging segment files (may be negative to allow gaps); if unspecified, segments will not be merged
-q, --no-coverage		Do not write coverage files
-z, --compress		Compress output files
--rplot		Plot the specified range using R
--ylim=<arg>		Set y axis limit for plots (default: auto)
--phasing=<arg>		Visualize <arg>-mer phasing
Alignment filter
-H, --hits-range=<arg,arg>		Set the allowed range of repetitiveness ('1,1' = nonrep reads)
-M, --mm-range=<arg,arg>		Set the allowed range of mismatches
-R, --region=<arg>		Only use reads that overlap with the range [chr1:pos1..[chr2:]pos2]
--assume-length=<arg>	(Default: 400)	Assume maximal alignment length <arg>, enables fast range queries
-X, --p3fix=<arg>		Set the 3' end to a fixed distance from the 5' end
-N, --read-lengths=<arg[,...]>		Use only reads of the given length(s)
-T, --strand=<arg>		Use only reads from the given strand
-B, --duplicates=<arg>		Report at maximum <arg> reads with the 5' end at the same position on the same strand
--wpoiss=<arg>		Window size for adaptive duplicate read filtering
--sam-ref=<arg>		Reference sequence for SAM file parsing
--peflags=<arg[,...]>		Use only reads with the given PE flag(s)
Coverage
-W, --weight-repetitive=<arg>	(Default: divide)	How to weight repetitive hits (divide or multiply or const)
Segmentation
-C, --static-threshold=<arg>	(Default: 10)	Coverage threshold [>] for static segmentation
-J, --minsize=<arg>	(Default: 20)	Segment size threshold [>=] for static or dynamic segmentation
-V, --probation=<arg>	(Default: 0)	Allow a mitigated threshold for at most <arg> base pairs inside a segment
-Q, --mitigator=<arg>	(Default: 1)	Modifier for calculation of the mitigated threshold, value in [0,1]
-D, --dynamic		Switches to dynamic segmentation
-S, --window-size=<arg>	(Default: 2000)	Sliding window size for dynamic segmentation
-P, --poisson-threshold=<arg>	(Default: 0.05)	Poisson probability threshold [<=] for dynamic segmentation

shore mg

Primitive metagenomic analysis

Usage: shore mg [OPTIONS] [MAP_PATHS]

Allowed options
-f, --mappaths=<arg[,...]>		Input directories or files
-o, --outfolder=<arg>	(Default: Mg)	Output directory, will be created
--collapse=<arg[:...][,...]>		Collapse any sequence ID not listed here to the next smaller one in the list
--power		Initialize the ID combinations for collapse with the power set of all IDs
--autocollapse=<arg>		Specify *.trans or ref.txt file to automatically collapse to the species level; preprocess must have been run with the --fullheader option; 2nd & 3rd word of fasta headers are taken to be the species name
--make-unique		Make alignments unique before processing
Read filter
-H, --hits-range=<arg,arg>		Set the allowed range of repetitiveness ('1,1' = nonrep reads)
-M, --mm-range=<arg,arg>		Set the allowed range of mismatches
-N, --read-lengths=<arg[,...]>		Use only reads of the given length(s)
-T, --strand=<arg>		Use only reads from the given strand
--sam-ref=<arg>		Reference sequence for SAM file parsing
--peflags=<arg[,...]>		Use only reads with the given PE flag(s)

shore count

shore count calculates the read count as well as other properties for regions in the genome that have already been defined by some other means. It may be used to analyze either fixed-size jumping windows over the genome or regions defined in an input file, e.g. to analyze annotated coding regions or to manually re-analyze regions defined by the segmentation algorithms of shore coverage, shore peak or shore srna.

Accepted input files are tab-delimited plain text files with a header specifying the columns chr, pos, size and optionally strand.

Usage: shore count [OPTIONS] [MAPFILES]

Mandatory
-m, --mapfiles=<arg[:...][,...]>		Alignment files or shore directories (flowcell, lane, pe or barcode; comma-separated; colon-separated items will be treated as single assay)
-o, --output-folder=<arg>	(Default: SegmentAnalysis)	Output directory, will be created
Variable size
-f, --segment-file=<arg>		Set file with segment information (expects a sorted file with columns chr, pos, size, strand)
Fixed size
-s, --segment-size=<arg>		Use segments of fixed size <arg> instead of a file
-j, --segment-distance=<arg>		Distance of fixed size segments (defaults to segment size)
-t, --strand-specific		Count both strands separately
Output
-k, --rpkm		Also calculate reads per kilobase & million (RPKM) values (totals calculated without applying the alignment filter)
--totals-file=<arg>		Read totals for RPKM calculation from a file
-a, --fasta-file=<arg>		If a fasta file is provided, the sequence will be reported for each segment
Counting
-O, --overlap=<arg>	(Default: 50%)	Required amount of overlap between read and feature (percentage or absolute)
-W, --weight-repetitive=<arg>	(Default: divide)	How to weight repetitive hits (divide or multiply or const)
Alignment filter
-H, --hits-range=<arg,arg>		Set the allowed range of repetitiveness ('1,1' = nonrep reads)
-M, --mm-range=<arg,arg>		Set the allowed range of mismatches
-R, --region=<arg>		Only use reads that overlap with the range [chr1:pos1..[chr2:]pos2]
--assume-length=<arg>	(Default: 400)	Assume maximal alignment length <arg>, enables fast range queries
-X, --p3fix=<arg>		Set the 3' end to a fixed distance from the 5' end
-N, --read-lengths=<arg[,...]>		Use only reads of the given length(s)
-T, --strand=<arg>		Use only reads from the given strand
-B, --duplicates=<arg>		Report at maximum <arg> reads with the 5' end at the same position on the same strand
--wpoiss=<arg>		Window size for adaptive duplicate read filtering
--sam-ref=<arg>		Reference sequence for SAM file parsing
--peflags=<arg[,...]>		Use only reads with the given PE flag(s)

shore tagstats

Gather read statistics for multiple samples. This is mainly intended for small RNA sequencing when no reference is available.

Usage: shore tagstats [OPTIONS] [PATHS]

Allowed options
-i, --readpaths=<arg[:...][,...]>		SHORE directories or read file paths
-o, --outdir=<arg>	(Default: ReadAnalysis)	Output directory
-r, --report=<arg>	(Default: 1)	Only report a sequence if it's represented at least <arg> times
-p, --pseudo=<arg>	(Default: 0)	Add a pseudocount of <arg> to each read count

shore binom_test

shore binom_test can be used to evaluate two sets of count data agaist each other using a binomial test.

Usage: shore binom_test [OPTIONS]

Allowed options
-i, --input-file=<arg>	(Default: stdin)	Read count input file
-o, --output-file=<arg>	(Default: stdout)	Output file
-p, --distribution-p=<arg>	(Default: 0.5)	Parameter p of the binomial distribution
-n, --normalization-file=<arg>		File with scaling factors for each column (overrides '-p')
-a, --alternative=<arg>	(Default: less)	Specifies the alternative hypothesis for the test (less or greater or twosided)
-1, --first-column=<arg>		Name of the first read count column
-2, --2nd-column=<arg>		Name of the second read count column (tested vs. column 1)
--global-scaling=<arg>	(Default: 1)	Scaling constant with wich all read counts are multiplied
-j, --input-header=<arg[,...]>		Specify header for input file, if not available
-f, --fold-change		Report fold enrichment values
--fdr-bh		Calculate Benjamini-Hochberg FDR
--sort		Sort output

shore mtc

The subprogram shore mtc implements various multiple testing correction methods. The expected input is a tab-delimited text file with a header, and the column containing the p-values to be adjusted must be named raw_p.

Implemented methods include

Benjamini-Hochberg false discovery rate control (fdr_bh)
Bonferroni familywise error rate control (fwer_bonferroni)
Holm familywise error rate control (fwer_holm)
Hochberg familywise error rate control (fwer_hochberg)
Sidak singlestep familywise error rate control (fwer_sidak_ss)
Sidak stepdown familywise error rate control (fwer_sidak_sd)
Benjamini-Yekutieli false discovery rate control (fdr_by).

Usage: shore mtc [OPTIONS]

Mandatory
-m, --method=<arg[,...]>		Select correction method(s), out of: fdr_bh, fwer_bonferroni, fwer_holm, fwer_hochberg, fwer_sidak_ss, fwer_sidak_sd, fdr_by
-i, --input-file=<arg>	(Default: stdin)	The file the raw p-values are read from (expects a column 'raw_p')
-o, --output-file=<arg>	(Default: stdout)	Output file
Output
-u, --fdr-max=<arg>	(Default: 1)	Maximum q-value to report
-e, --echo-comments		Echo all comments read from input files to stdout
-q, --quiet		Do not print input, only report the q-values
Other
-j, --input-header=<arg[,...]>		Use arg as input file header

shore annotate_region

shore annotate_region can be used to annotate previously defined genomic regions with the overlapping or nearest genes present in an annotation file. Only the central base of each region will be annotated. The annotation file must be in standard GFF format.

Usage: shore annotate_region [OPTIONS]

Mandatory
-a, --annotation-file=<arg>		Annotation file
-f, --feature-file=<arg>		File with the features to be annotated. This file must contain a header specifying the columns 'chr', 'pos' and optionally 'size' or 'end'
-o, --outfile=<arg>	(Default: stdout)	Output file
Optional
--header=<arg[,...]>		Header for the feature file
--range		Use the real regions and not just the central base
--gff		Write output in GFF format
--so-filter=<arg[,...]>	(Default: gene,transposable_element_gene)	Only parse toplevel features of the given Sequence Ontology (SO) types
--print		Just print the annotation tree
--query-pos=<arg>		Query annotation for the given position

shore convert

Convert SHORE files into common file formats, and vice versa.

Available converters:

Alignment2ALN
Alignment2BED
Alignment2GFF
Alignment2Maplist
Alignment2SAM
ColorFlat2Fastq
Contig2AFG
Eland2Maplist
ExpandTabs
FlatPair2Fastq
Maplist2Eland
Reads2Fasta
Reads2Fastq
Reads2Flat
Reads2Qual
Solid2Fastq
Solid2Flat
Variant2GFF
Variant2VCF

Alignment2... converters can convert

SHORE map.list files (default)
SAM files (*.sam)
BAM files (*.bam)

Reads2... converters can convert

SHORE reads_0.fl files (default)
FastQ files (*.fq, *.fastq)
454 Standard Flowgram Format SFF (*.sff)
Illumina QSEQ files (*.qseq, *_qseq.txt)
SHORE map.list files (*.list) (discards alignment information and only keeps the read information; input files must be sorted by read ID)

By default, the SHORE file formats (map.list and reads_0.fl, respectively) are expected as input.
All other file types must have the correct file extensions to be recognized (an additional .gz is allowed for compressed files).

Additionally, the special file names stdin and stdout may be used for reading from standard input and for writing to standard output, respectively.

For stdin, map.list format is expected for Alignment2... conversions and reads_0.fl format for Reads2... conversions. To convert different formats from standard input, use e.g. stdin.sam, stdin.fastq.gz, etc.

shore sort

Sort / merge tab-delimited text files

Usage: shore sort [OPTIONS] [TEXT_FILES]

Allowed options
-i, --infiles=<arg[,...]>		A comma-separated list of plain-text input files
-o, --outfile=<arg>	(Default: stdout)	Output file
-p, --preset=<arg>		Automatically select sort keys for the file type specified. Supported values: * maplist: map.list format sorted by genomic coordinate * maplist_id: map.list format sorted by read ID * reads0: reads flat file format sorted by read ID * gff: GFF format sorted by position
-k, --keystring=<arg>		Concatenation of column ids (counted from 1) and key types. Valid key types: t (text), i (integer) and f (float); capital letters reverse the sort order - e.g. '-k 1i5t3i7I'.
-I, --inplace		Output file is the same as the input file
-t, --tmpdir=<arg>		Temporary file directory (defaults to $TMPDIR or /tmp)
-B, --blocksize=<arg>	(Default: 2048)	Block size in megabytes
-m, --nur-merge		Merge already sorted files
-u, --unique		Output only the first of an equal run
-c, --check		Only test if the files are sorted
-b, --upper-bound=<arg[,...]>		Returns byte offset (counted from 0) and text of the first line in a sorted file that compares greater than the keys given in <arg> (provide comma-separated values in order of key priority)
-T, --tail=<arg[,...]>		Print all lines in a sorted file that compare greater than the keys given in <arg> (provide comma-separated values in order of key priority)
-C, --no-comments		Do not treat line comments and empty lines specially
-v, --verbose		Be more verbose

shore compress

Compress files to indexed gzip format

Usage: shore compress [OPTIONS] FILES

Allowed options
--outfile=<arg>		Write to the file <arg> instead of <infile>.gz
--replace		Remove original files after compression. If the input file is already compressed it will be recompressed and replaced
--tail=<arg>		Instead of compressing files, dump the last <arg> bytes of a seekable file
--dumpgzx		Print out the index for each file

shore 2dex

Range-indexing and query for tab-delimited text files

Usage: shore 2dex [OPTIONS] [TEXT_FILES]

Mandatory
-i, --infiles=<arg[,...]>		A comma-separated list of tab-delimited plain-text input files (can also be any SHORE directory when -f MAPLIST is set)
Format Options
-f, --format=<arg>		Provide file type for automatic settings, valid file types: MAPLIST, GFF, SAM
-c, --chr-column=<arg>		Column w. chromosome or sequence name, provide the column name or @<column_number>
-p, --pos-column=<arg>		Column w. start position, provide the column name or @<column_number>
-s, --size-column=<arg>		Column w. feature size, provide the column name or @<column_number>
-e, --end-column=<arg>		Column w. end position (inclusive), provide the column name or @<column_number>
-x, --xend-column=<arg>		Column w. end position (exclusive), provide the column name or @<column_number>
-C, --commentchar=<arg>		Comment line symbol
Index Options
-B, --blocksize=<arg>	(Default: 131072)	Block size determining the index resolution in bytes
-G, --maxgap=<arg>	(Default: 131072)	Maximum sequence gap in a block
Query Options
-q, --query=<arg>		A range to query; prints all overlapping records. Valid ranges: 'SEQ:POS~SIZE', 'SEQ:POS..END', 'SEQ1:POS..SEQ2:END', 'SEQ:POS...XEND', 'SEQ1:POS...SEQ2:XEND' (END: inclusive, XEND: exclusive)
Other
-v, --verbose		Be more verbose
-Q, --quiet		Be less verbose

shore idtrans

SHORE uses numerical identifiers for all sequences of the reference. shore idtrans simplifies translating these numbers in some of the result files back into chromosome names as specified in the reference fasta file (and vice versa).

Required is either a *.trans file which is stored in the IndexFolder by shore preprocess, or a ref.txt file generated by shore mapflowcell.

Usage: shore idtrans [OPTIONS] FILES

Allowed options
-t, --transfile=<arg>		.trans file from IndexFolder*
-r, --reffile=<arg>		ref.txt file generated by mapflowcell
-o, --outfile=<arg>		Output file (default: <infile>.idtrans)
-c, --columns=<arg[,...]>	(Default: chr)	Columns to be translated (column names or @<column_number>)
--name2id		Translate names to IDs (default: translate IDs to names)
--nocompress		Do not compress output files

SHORE Subprograms

Contents

shore srna

shore coverage

shore mg

shore count

shore tagstats

shore binom_test

shore mtc

shore annotate_region

shore convert

shore sort

shore compress

shore 2dex

shore idtrans

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools