Difference between revisions of "SHORE Subprograms"

From SHORE wiki
Jump to: navigation, search
(shore convert)
(shore sort)
Line 2: Line 2:
  
 
Sort / merge tab-delimited text files
 
Sort / merge tab-delimited text files
 
 
'''Usage:''' shore sort [OPTIONS] [TEXT_FILES]
 
 
{|
 
| colspan=2 | '''Allowed options'''
 
|----
 
| ''-i'', ''--infiles=<arg[,...]>''      ||                        || A comma-separated list of plain-text input files
 
|----
 
| ''-o'', ''--outfile=<arg>''            || (Default: ''stdout'')  || Output file
 
|----
 
| ''-p'', ''--preset=<arg>''            ||                        || Automatically select sort keys for the file type specified. Supported values: * maplist: map.list format sorted by genomic coordinate * maplist_id: map.list format sorted by read ID * reads0: reads flat file format sorted by read ID * gff: GFF format sorted by position
 
|----
 
| ''-k'', ''--keystring=<arg>''          ||                        || Concatenation of column ids (counted from 1) and key types. Valid key types: t (text), i (integer) and f (float); capital letters reverse the sort order - e.g. '-k 1i5t3i7I'.
 
|----
 
| ''-I'', ''--inplace''                  ||                        || Output file is the same as the input file
 
|----
 
| ''-t'', ''--tmpdir=<arg>''            ||                        || Temporary file directory (defaults to $TMPDIR or /tmp)
 
|----
 
| ''-B'', ''--blocksize=<arg>''          || (Default: ''2048'')    || Block size in megabytes
 
|----
 
| ''-m'', ''--nur-merge''                ||                        || Merge already sorted files
 
|----
 
| ''-u'', ''--unique''                  ||                        || Output only the first of an equal run
 
|----
 
| ''-c'', ''--check''                    ||                        || Only test if the files are sorted
 
|----
 
| ''-b'', ''--upper-bound=<arg[,...]>''  ||                        || Returns byte offset (counted from 0) and text of the first line in a sorted file that compares greater than the keys given in <arg> (provide comma-separated values in order of key priority)
 
|----
 
| ''-T'', ''--tail=<arg[,...]>''        ||                        || Print all lines in a sorted file that compare greater than the keys given in <arg> (provide comma-separated values in order of key priority)
 
|----
 
| ''-C'', ''--no-comments''              ||                        || Do not treat line comments and empty lines specially
 
|----
 
| ''-v'', ''--verbose''                  ||                        || Be more verbose
 
|----
 
|}
 
  
 
=shore compress=
 
=shore compress=

Revision as of 14:42, 23 September 2011

shore sort

Sort / merge tab-delimited text files

shore compress

Compress files to indexed gzip format


Usage: shore compress [OPTIONS] FILES

Allowed options
--outfile=<arg> Write to the file <arg> instead of <infile>.gz
--replace Remove original files after compression. If the input file is already compressed it will be recompressed and replaced
--tail=<arg> Instead of compressing files, dump the last <arg> bytes of a seekable file
--dumpgzx Print out the index for each file

shore 2dex

Range-indexing and query for tab-delimited text files


Usage: shore 2dex [OPTIONS] [TEXT_FILES]

Mandatory
-i, --infiles=<arg[,...]> A comma-separated list of tab-delimited plain-text input files (can also be any SHORE directory when -f MAPLIST is set)
Format Options
-f, --format=<arg> Provide file type for automatic settings, valid file types: MAPLIST, GFF, SAM
-c, --chr-column=<arg> Column w. chromosome or sequence name, provide the column name or @<column_number>
-p, --pos-column=<arg> Column w. start position, provide the column name or @<column_number>
-s, --size-column=<arg> Column w. feature size, provide the column name or @<column_number>
-e, --end-column=<arg> Column w. end position (inclusive), provide the column name or @<column_number>
-x, --xend-column=<arg> Column w. end position (exclusive), provide the column name or @<column_number>
-C, --commentchar=<arg> Comment line symbol
Index Options
-B, --blocksize=<arg> (Default: 131072) Block size determining the index resolution in bytes
-G, --maxgap=<arg> (Default: 131072) Maximum sequence gap in a block
Query Options
-q, --query=<arg> A range to query; prints all overlapping records. Valid ranges: 'SEQ:POS~SIZE', 'SEQ:POS..END', 'SEQ1:POS..SEQ2:END', 'SEQ:POS...XEND', 'SEQ1:POS...SEQ2:XEND' (END: inclusive, XEND: exclusive)
Other
-v, --verbose Be more verbose
-Q, --quiet Be less verbose

shore idtrans

SHORE uses numerical identifiers for all sequences of the reference. shore idtrans simplifies translating these numbers in some of the result files back into chromosome names as specified in the reference fasta file (and vice versa).

Required is either a *.trans file which is stored in the IndexFolder by shore preprocess, or a ref.txt file generated by shore mapflowcell.


Usage: shore idtrans [OPTIONS] FILES

Allowed options
-t, --transfile=<arg> *.trans file from IndexFolder
-r, --reffile=<arg> ref.txt file generated by mapflowcell
-o, --outfile=<arg> Output file (default: <infile>.idtrans)
-c, --columns=<arg[,...]> (Default: chr) Columns to be translated (column names or @<column_number>)
--name2id Translate names to IDs (default: translate IDs to names)
--nocompress Do not compress output files