Shore convert

From SHORE wiki
Revision as of 13:48, 21 March 2012 by Felo80 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

shore convert converts SHORE files into common file formats, and vice versa.

Available converters

  • Alignment2ALN: Convert maplist, SAM or BAM files into cisgenome ALN format
  • Alignment2BED: Convert maplist, SAM or BAM files into BED format.
  • Alignment2GFF: Convert maplist, SAM or BAM files into GFF3 format.
  • Alignment2Maplist: Convert SAM or BAM files into maplist format.
  • Alignment2SAM: Convert maplist or BAM files into SAM format.
  • ExpandTabs
  • Reads2Fasta
  • Reads2Fastq
  • Reads2Flat
  • Reads2Qual
  • Variant2VCF


Alignment2... converters can convert

  • SHORE map.list files (default)
  • SAM files (*.sam)
  • BAM files (*.bam)


Reads2... converters can convert

  • SHORE reads_0.fl files (default)
  • FastQ files (*.fq, *.fastq)
  • 454 Standard Flowgram Format SFF (*.sff)
  • Illumina QSEQ files (*.qseq, *_qseq.txt)
  • SHORE map.list files (*.list) (discards alignment information and only keeps the read information; input files must be sorted by read ID)
By default, the SHORE file formats (map.list and reads_0.fl, respectively) are expected as input.
All other file types must have the correct file extensions to be recognized (an additional .gz is allowed for compressed files).

Additionally, the special file names stdin and stdout may be used for reading from standard input and for writing to standard output, respectively.

For stdin, map.list format is expected for Alignment2... conversions and reads_0.fl format for Reads2... conversions. To convert different formats from standard input, use e.g. stdin.sam, stdin.fastq.gz, etc.

Command line options

Usage: shore convert [OPTIONS] CONVERTER CONVERTER_ARGS

Note that all options must be specified before the converter name, e.g:

$ shore convert -r hg19.fa Alignment2Maplist sample1.bam
Alignment conversion
-r, --refseq=STRING Reference sequence
-n, --max-edit-distance=INT Maximum allowed edit distance
-g, --max-gaps=INT Maximum number of gap openings
-e, --max-gap-extension=INT Maximum gap extension
-s, --discard-softclipped Discard soft-clipped alignments
-l, --leftover-file=STRING Leftover reads file (input or output)
-S, --sort Sort map.list output by alignment coordinate
Fastq output
--illumina Report quality strings with Illumina offset instead of sanger
--flowcell-name=STRING If this is set, conversion of SHORE read IDs back into Illumina FastQ read IDs will be attempted