Difference between revisions of "Shore correct4pe"
(→Command line options) |
|||
Line 36: | Line 36: | ||
| ''-r'' INT || (Default: ''10000'') || Maximum number of hits per read-pair | | ''-r'' INT || (Default: ''10000'') || Maximum number of hits per read-pair | ||
|---- | |---- | ||
− | | ''-s'' | + | | ''-s'' || || SOLiD reads |
|---- | |---- | ||
− | | ''-m'' | + | | ''-m'' || || Mate pair library instead of Paired-end library |
|---- | |---- | ||
| ''-i'' STRING || || Insert distribution file (e.g. when re-running correct4pe) | | ''-i'' STRING || || Insert distribution file (e.g. when re-running correct4pe) | ||
|---- | |---- | ||
− | | ''-d'' | + | | ''-d'' || || Delete uncorrected map.list files |
+ | |---- | ||
+ | | ''-p'' || || Plot insert dist | ||
|---- | |---- | ||
− | |||
|---- | |---- | ||
|} | |} |
Revision as of 10:04, 28 September 2011
shore correct4pe finds the most likely mapping of repetitive reads by utilizing paired-end information. While in paired read mapping each read is aligned separately, read pair information can be used to increase the likelihood of an alignment by selecting the paired alignment based on the most likely distance between the pairs.
shore correct4pe starts by estimating the insert size distribution. The upper bound of this distribution is usually very sharp (clones longer than expected seem to be very rare), whereas the lower boundary is more blurred and very small clones can be observed as well. The insert size distribution is then translated into a probability distribution for the observation of a given distance of a pairing (where pairing is defined as the combination of one of the mappings of read 1 with one of the mappings of read 2). All possible combinations of the mappings of both reads of a pair are compared and all pairings with a probability equal to zero are dismissed. Mappings which are not in a pairing with a probability above zero are deleted. This removes all repetitive mappings, which resulted from repeats. If there is a mapping of one read pair with two different mappings of the other read the more likely pairing is kept. If all pairings have zero probability all mappings of both reads are kept. These are the discordant (unhappy) read pairs which typically are used to predict structural variants.
shore correct4pe will plot the insert size distribution using the R if -p is specified. In this case R has to be installed and included in the PATH environment variable.
Command line options
Usage: shore correct4pe [OPTIONS]
Mandatory | ||
-l STRING[,...] | Lane or sample directories (comma separated) | |
-x INT | Expected insert size, has to be larger than 0 | |
-e INT | Library identifier, defines name space of the read identifiers (>=1) | |
Optional | ||
-r INT | (Default: 10000) | Maximum number of hits per read-pair |
-s | SOLiD reads | |
-m | Mate pair library instead of Paired-end library | |
-i STRING | Insert distribution file (e.g. when re-running correct4pe) | |
-d | Delete uncorrected map.list files | |
-p | Plot insert dist |