Difference between revisions of "SHORE Subprograms"

From SHORE wiki
Jump to: navigation, search
(shore annotate_region)
(shore convert)
Line 1: Line 1:
=shore convert=
 
 
Convert SHORE files into common file formats, and vice versa.
 
 
Available converters:
 
 
* Alignment2ALN
 
* Alignment2BED
 
* Alignment2GFF
 
* Alignment2Maplist
 
* Alignment2SAM
 
* ColorFlat2Fastq
 
* Contig2AFG
 
* Eland2Maplist
 
* ExpandTabs
 
* FlatPair2Fastq
 
* Maplist2Eland
 
* Reads2Fasta
 
* Reads2Fastq
 
* Reads2Flat
 
* Reads2Qual
 
* Solid2Fastq
 
* Solid2Flat
 
* Variant2GFF
 
* Variant2VCF
 
 
''Alignment2...'' converters can convert
 
* SHORE ''[[map.list]]'' files (''default'')
 
* ''SAM'' files (''*.sam'')
 
* ''BAM'' files (''*.bam'')
 
 
''Reads2...'' converters can convert
 
* SHORE ''[[reads_0.fl]]'' files (''default'')
 
* ''FastQ'' files (''*.fq'', ''*.fastq'')
 
* 454 Standard Flowgram Format ''SFF'' (''*.sff'')
 
* Illumina ''QSEQ'' files (''*.qseq'', ''*_qseq.txt'')
 
* SHORE ''[[map.list]]'' files (''*.list'') (discards alignment information and only keeps the read information; input files must be sorted by read ID)
 
 
By default, the SHORE file formats (''map.list'' and ''reads_0.fl'', respectively) are expected as input.
 
All other file types must have the correct '''file extensions''' to be recognized (an additional ''.gz'' is allowed for compressed files).
 
 
Additionally, the special file names ''stdin'' and ''stdout'' may be used for reading from standard input and for writing to standard output, respectively.
 
 
For ''stdin'', ''map.list'' format is expected for ''Alignment2...'' conversions and ''reads_0.fl'' format for ''Reads2...'' conversions.
 
To convert different formats from standard input, use e.g. ''stdin.sam'', ''stdin.fastq.gz'', etc.
 
 
 
=shore sort=
 
=shore sort=
  

Revision as of 14:40, 23 September 2011

shore sort

Sort / merge tab-delimited text files


Usage: shore sort [OPTIONS] [TEXT_FILES]

Allowed options
-i, --infiles=<arg[,...]> A comma-separated list of plain-text input files
-o, --outfile=<arg> (Default: stdout) Output file
-p, --preset=<arg> Automatically select sort keys for the file type specified. Supported values: * maplist: map.list format sorted by genomic coordinate * maplist_id: map.list format sorted by read ID * reads0: reads flat file format sorted by read ID * gff: GFF format sorted by position
-k, --keystring=<arg> Concatenation of column ids (counted from 1) and key types. Valid key types: t (text), i (integer) and f (float); capital letters reverse the sort order - e.g. '-k 1i5t3i7I'.
-I, --inplace Output file is the same as the input file
-t, --tmpdir=<arg> Temporary file directory (defaults to $TMPDIR or /tmp)
-B, --blocksize=<arg> (Default: 2048) Block size in megabytes
-m, --nur-merge Merge already sorted files
-u, --unique Output only the first of an equal run
-c, --check Only test if the files are sorted
-b, --upper-bound=<arg[,...]> Returns byte offset (counted from 0) and text of the first line in a sorted file that compares greater than the keys given in <arg> (provide comma-separated values in order of key priority)
-T, --tail=<arg[,...]> Print all lines in a sorted file that compare greater than the keys given in <arg> (provide comma-separated values in order of key priority)
-C, --no-comments Do not treat line comments and empty lines specially
-v, --verbose Be more verbose

shore compress

Compress files to indexed gzip format


Usage: shore compress [OPTIONS] FILES

Allowed options
--outfile=<arg> Write to the file <arg> instead of <infile>.gz
--replace Remove original files after compression. If the input file is already compressed it will be recompressed and replaced
--tail=<arg> Instead of compressing files, dump the last <arg> bytes of a seekable file
--dumpgzx Print out the index for each file

shore 2dex

Range-indexing and query for tab-delimited text files


Usage: shore 2dex [OPTIONS] [TEXT_FILES]

Mandatory
-i, --infiles=<arg[,...]> A comma-separated list of tab-delimited plain-text input files (can also be any SHORE directory when -f MAPLIST is set)
Format Options
-f, --format=<arg> Provide file type for automatic settings, valid file types: MAPLIST, GFF, SAM
-c, --chr-column=<arg> Column w. chromosome or sequence name, provide the column name or @<column_number>
-p, --pos-column=<arg> Column w. start position, provide the column name or @<column_number>
-s, --size-column=<arg> Column w. feature size, provide the column name or @<column_number>
-e, --end-column=<arg> Column w. end position (inclusive), provide the column name or @<column_number>
-x, --xend-column=<arg> Column w. end position (exclusive), provide the column name or @<column_number>
-C, --commentchar=<arg> Comment line symbol
Index Options
-B, --blocksize=<arg> (Default: 131072) Block size determining the index resolution in bytes
-G, --maxgap=<arg> (Default: 131072) Maximum sequence gap in a block
Query Options
-q, --query=<arg> A range to query; prints all overlapping records. Valid ranges: 'SEQ:POS~SIZE', 'SEQ:POS..END', 'SEQ1:POS..SEQ2:END', 'SEQ:POS...XEND', 'SEQ1:POS...SEQ2:XEND' (END: inclusive, XEND: exclusive)
Other
-v, --verbose Be more verbose
-Q, --quiet Be less verbose

shore idtrans

SHORE uses numerical identifiers for all sequences of the reference. shore idtrans simplifies translating these numbers in some of the result files back into chromosome names as specified in the reference fasta file (and vice versa).

Required is either a *.trans file which is stored in the IndexFolder by shore preprocess, or a ref.txt file generated by shore mapflowcell.


Usage: shore idtrans [OPTIONS] FILES

Allowed options
-t, --transfile=<arg> *.trans file from IndexFolder
-r, --reffile=<arg> ref.txt file generated by mapflowcell
-o, --outfile=<arg> Output file (default: <infile>.idtrans)
-c, --columns=<arg[,...]> (Default: chr) Columns to be translated (column names or @<column_number>)
--name2id Translate names to IDs (default: translate IDs to names)
--nocompress Do not compress output files