Difference between revisions of "SHORE File Formats"
m (moved SHORE file formats to SHORE File Formats) |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 7: | Line 7: | ||
[[shore convert]] may be used to convert SHORE files into various third-party file formats, or vice versa. | [[shore convert]] may be used to convert SHORE files into various third-party file formats, or vice versa. | ||
− | = | + | = SHORE ''FlatRead'' file format = |
Read files can be found in the [[LengthFolder|''LengthFolders'']] or [[ReadFolder|''ReadFolders'']]. These files are created by ''[[shore import]]'' and are named ''reads_0.fl''. | Read files can be found in the [[LengthFolder|''LengthFolders'']] or [[ReadFolder|''ReadFolders'']]. These files are created by ''[[shore import]]'' and are named ''reads_0.fl''. | ||
Line 36: | Line 36: | ||
| String of sanger calibrated quality values | | String of sanger calibrated quality values | ||
− | Encoding: ASCII 33 (''''!'''', quality 0) to ASCII 73 (''''I'''', quality 40); extended range ASCII 93 ('''']'''', quality 60) | + | Encoding: ASCII 33 (' '''!''' ', quality 0) to ASCII 73 (' '''I''' ', quality 40); extended range ASCII 93 (' ''']''' ', quality 60) |
|----valign="top" | |----valign="top" | ||
|5 | |5 | ||
Line 43: | Line 43: | ||
| String of illumina chastity values (defined as <math>Intensity(max)/ (Intensity(max) + Intensity(second))</math>). | | String of illumina chastity values (defined as <math>Intensity(max)/ (Intensity(max) + Intensity(second))</math>). | ||
− | Encoding: ASCII 40 (''''('''', chastity of 0.5) to ASCII 90 (''''Z'''', chastity of 1.0) | + | Encoding: ASCII 40 (' '''(''' ', chastity of 0.5) to ASCII 90 (' '''Z''' ', chastity of 1.0) |
|----valign="top" | |----valign="top" | ||
|5/6 | |5/6 | ||
| ['''Tags'''] | | ['''Tags'''] | ||
(Optional column) | (Optional column) | ||
− | |3-character tags in the format '' | + | |3-character tags in the format '''<tt>~<SPACE>TAG:value;TAG:value;...;</tt>'''. |
Valid tags are | Valid tags are | ||
* BAD: bad quality read (integer) | * BAD: bad quality read (integer) | ||
Line 56: | Line 56: | ||
* DST: read has an alignment with the given edit distance (integer) | * DST: read has an alignment with the given edit distance (integer) | ||
* RGR: read group (string) | * RGR: read group (string) | ||
− | * NUM number of (non-technical) reads for the template (default 2 for non-single reads, mandatory for >2 non-technical reads) (integer) | + | * NUM: number of (non-technical) reads for the template (default 2 for non-single reads, mandatory for >2 non-technical reads) (integer) |
|} | |} | ||
− | = | + | = SHORE ''MapList'' file format = |
SHORE alignment files are typically named ''map.list'', ''map.list.1'' or ''map.list.2''. They are stored in the [[SHORE Overview#The RunFolder and the read data|''LengthFolders'' or ''ReadFolders'']]. | SHORE alignment files are typically named ''map.list'', ''map.list.1'' or ''map.list.2''. They are stored in the [[SHORE Overview#The RunFolder and the read data|''LengthFolders'' or ''ReadFolders'']]. | ||
Line 85: | Line 85: | ||
** Long deletions with respect to the reference may be reported as the character ''L'' followed by the size of the deletion, e.g. '''[L100]''' | ** Long deletions with respect to the reference may be reported as the character ''L'' followed by the size of the deletion, e.g. '''[L100]''' | ||
* Unaligned sequence ('soft clip') may be reported in angle brackets, e.g. '''<TTTTTT>''' | * Unaligned sequence ('soft clip') may be reported in angle brackets, e.g. '''<TTTTTT>''' | ||
− | |||
* Consecutive stretches of the same operation (mismatch, insertion, deletion) may be abbreviated, e.g. '''[CTT,---]''' instead of '''[C-][T-][T-]''' | * Consecutive stretches of the same operation (mismatch, insertion, deletion) may be abbreviated, e.g. '''[CTT,---]''' instead of '''[C-][T-][T-]''' | ||
* ''F'' can be used to indicate a mapped part of a fragment with known size, but unknown sequence, e.g. '''[F100]''' | * ''F'' can be used to indicate a mapped part of a fragment with known size, but unknown sequence, e.g. '''[F100]''' | ||
Line 131: | Line 130: | ||
|11 | |11 | ||
| '''Sanger quality values''' | | '''Sanger quality values''' | ||
− | | String of sanger calibrated quality values. | + | | String of sanger calibrated quality values, whose start corresponds to the 5' end of the read. |
− | Encoding: ASCII 33 (''''!'''', quality 0) to ASCII 73 (''''I'''', quality 40); extended range ASCII 93 ('''']'''', quality 60) | + | Encoding: ASCII 33 (' '''!''' ', quality 0) to ASCII 73 (' '''I''' ', quality 40); extended range ASCII 93 (' ''']''' ', quality 60) |
|----valign="top" | |----valign="top" | ||
|12 | |12 | ||
| ['''Chastity values'''] | | ['''Chastity values'''] | ||
(Optional column) | (Optional column) | ||
− | | String of illumina chastity values defined as the highest intensity divided by the sum of the highest and the second highest intensity of a single base. | + | | String of illumina chastity values defined as the highest intensity divided by the sum of the highest and the second highest intensity of a single base. The start of the string corresponds to the 5' end of the read. |
− | Encoding: ASCII 40 (''''('''', chastity of 0.5) to ASCII 90 (''''Z'''', chastity of 1.0) | + | Encoding: ASCII 40 (' '''(''' ', chastity of 0.5) to ASCII 90 (' '''Z''' ', chastity of 1.0) |
|----valign="top" | |----valign="top" | ||
|12/13 | |12/13 | ||
| ['''Tags'''] | | ['''Tags'''] | ||
(Optional column) | (Optional column) | ||
− | |3-character tags in the format '' | + | |3-character tags in the format '''<tt>~<SPACE>TAG:value;TAG:value;...;</tt>'''. |
Valid tags are | Valid tags are | ||
* MPQ: mapping quality (integer) | * MPQ: mapping quality (integer) | ||
* NXP: mapping position of the next read of the template ([0-9]+:[0-9]+[DP]), valid if read is concordant or partner has a single mapping pos | * NXP: mapping position of the next read of the template ([0-9]+:[0-9]+[DP]), valid if read is concordant or partner has a single mapping pos | ||
* RGR: read group (string) | * RGR: read group (string) | ||
− | * NUM number of (non-technical) reads for the template (default 2 for non-single reads, mandatory for >2 non-technical reads) (integer) | + | * NUM: number of (non-technical) reads for the template (default 2 for non-single reads, mandatory for >2 non-technical reads) (integer) |
|} | |} |
Latest revision as of 13:41, 20 June 2013
Any output generated by SHORE will usually be written to various text files that contain a number of tab-delimited columns.
Typing shore fmt will display a quick reference for many of SHORE’s file formats.
This page only describes SHORE's read and alignment file formats; other files formats are described on the page of the respective subprogram that generates them.
shore convert may be used to convert SHORE files into various third-party file formats, or vice versa.
SHORE FlatRead file format
Read files can be found in the LengthFolders or ReadFolders. These files are created by shore import and are named reads_0.fl.
reads_0.fl files are usually sorted on the id field in numerical order.
The tab delimited entries are:
1 | id | A unique identifier for the read or read pair |
2 | sequence | DNA sequence |
3 | index | Paired-end sequencing information - read index optionally prefixed by a single character flag:
|
4 | Sanger quality values | String of sanger calibrated quality values
Encoding: ASCII 33 (' ! ', quality 0) to ASCII 73 (' I ', quality 40); extended range ASCII 93 (' ] ', quality 60) |
5 | [Chastity values]
(Optional column) |
String of illumina chastity values (defined as <math>Intensity(max)/ (Intensity(max) + Intensity(second))</math>).
Encoding: ASCII 40 (' ( ', chastity of 0.5) to ASCII 90 (' Z ', chastity of 1.0) |
5/6 | [Tags]
(Optional column) |
3-character tags in the format ~<SPACE>TAG:value;TAG:value;...;.
Valid tags are
|
SHORE MapList file format
SHORE alignment files are typically named map.list, map.list.1 or map.list.2. They are stored in the LengthFolders or ReadFolders.
MapList files are sorted in numerical order, either on the fields chr id and pos, or on the field read id.
The tab delimited entries are:
1 | chr id | Each chromosome has an internal id, simply enumerated according to their occurrence within the reference sequence file, starting from 1. Translation to the native chromosome name can be found in the *.shore.ref and *.shore.trans file in the IndexFolder, or in the ref.txt files created by shore mapflowcell. |
2 | pos | Left-most position of the alignment relative to the forward strand of the reference sequence. The first position of a chromosome is 1. |
3 | alignment | String representation of the read alignment. The sequence is always reported with respect to the forward strand of the reference, i.e. the sequence of reads matching to the reverse strand is reverse complemented.
|
4 | read id | A unique identifier for the read or read pair |
5 | strand | D for forward and P for reverse hits (direct and palindromic, respectively). |
6 | mismatches | The number of mismatches+gaps in the alignment. |
7 | hits | The total number of genomic positions the read is aligned to. |
8 | read length | Length of the read ('soft clipped' nucleotides excluded). |
9 | offset | Alignment start offset into the read (local alignments, first base of the read is 1)
|
10 | pe flag | Paired-end information - read index optionally prefixed by a single character flag:
Paired-end information was encoded differently prior to SHORE v0.8, see SHORE_v0.7_file_formats. |
11 | Sanger quality values | String of sanger calibrated quality values, whose start corresponds to the 5' end of the read.
Encoding: ASCII 33 (' ! ', quality 0) to ASCII 73 (' I ', quality 40); extended range ASCII 93 (' ] ', quality 60) |
12 | [Chastity values]
(Optional column) |
String of illumina chastity values defined as the highest intensity divided by the sum of the highest and the second highest intensity of a single base. The start of the string corresponds to the 5' end of the read.
Encoding: ASCII 40 (' ( ', chastity of 0.5) to ASCII 90 (' Z ', chastity of 1.0) |
12/13 | [Tags]
(Optional column) |
3-character tags in the format ~<SPACE>TAG:value;TAG:value;...;.
Valid tags are
|