Shore correct4pe

From SHORE wiki
Revision as of 10:00, 28 September 2011 by Felo80 (Talk | contribs)

Jump to: navigation, search

shore correct4pe finds the most likely mapping of repetitive reads by utilizing paired-end information. While in paired read mapping each read is aligned separately, read pair information can be used to increase the likelihood of an alignment by selecting the paired alignment based on the most likely distance between the pairs.

shore correct4pe starts by estimating the insert size distribution. The upper bound of this distribution is usually very sharp (clones longer than expected seem to be very rare), whereas the lower boundary is more blurred and very small clones can be observed as well. The insert size distribution is then translated into a probability distribution for the observation of a given distance of a pairing (where pairing is defined as the combination of one of the mappings of read 1 with one of the mappings of read 2). All possible combinations of the mappings of both reads of a pair are compared and all pairings with a probability equal to zero are dismissed. Mappings which are not in a pairing with a probability above zero are deleted. This removes all repetitive mappings, which resulted from repeats. If there is a mapping of one read pair with two different mappings of the other read the more likely pairing is kept. If all pairings have zero probability all mappings of both reads are kept. These are the discordant (unhappy) read pairs which typically are used to predict structural variants.

shore correct4pe will plot the insert size distribution using the R if -p is specified. In this case R has to be installed and included in the PATH environment variable.

Command line options

Usage: shore correct4pe [OPTIONS]

Mandatory
-l STRING[,...] Lane or sample directories (comma separated)
-x INT Expected insert size, has to be larger than 0
-e INT Library identifier, defines name space of the read identifiers (>=1)
Optional
-r INT (Default: 10000) Maximum number of hits per read-pair
-s INT SOLiD reads
-m INT Mate pair library instead of Paired-end library
-i STRING Insert distribution file (e.g. when re-running correct4pe)
-d INT Delete uncorrected map.list files
-p INT Plot insert dist