Difference between revisions of "Shore mapflowcell"

From SHORE wiki
Jump to: navigation, search
 
Line 1: Line 1:
#REDIRECT [[SHORE_Subprograms#shore_mapflowcell]]
+
'''shore mapflowcell'''
 +
performs the actual read alignments to a reference genome.
 +
 
 +
SHORE supports various [[alignment tools]] to always provide the best option for various applications.
 +
The default tool, ''GenomeMapper'', is extensively tested.
 +
Currently the other available options are ''BWA'', ''Bowtie'', ''Novocraft'' and ''Eland''.
 +
 
 +
SHORE mapflowcell will create an alignment file named ''[[map.list]]'' corresponding to each of the ''[[reads_0.fl]]'' files in the input directories.
 +
 
 +
 
 +
'''Usage:''' shore mapflowcell [OPTIONS] [READ_PATHS]
 +
 
 +
{|cellpadding=5
 +
| colspan=2 | '''Mandatory'''
 +
|----
 +
| ''-f'', ''--files''=STRING[,...]    ||                              || Shore directories ([[RunFolder|run, lane, pe or sample]]) or [[read file]]s
 +
|----
 +
| ''-i'', ''--index-file''=STRING    ||                              || Fasta file in IndexFolder, *.shore file
 +
|----
 +
|----
 +
| colspan=2 | '''Mapping tools'''
 +
|----
 +
| ''-v'', ''--mapper''=STRING        || (Default: ''genomemapper'')  || <genomemapper> <novo> <bowtie> <eland> <bwa> <gsnap> <blat>
 +
|----
 +
| ''-C'', ''--color''                ||                              || BWA & Bowtie: Reads and index are in colorspace
 +
|----
 +
| ''-B'', ''--HSO''                  ||                              || GenomeMapper: Turn on alignment of BS-seq reads. (EXPERIMENTAL)
 +
|----
 +
|----
 +
| colspan=2 | '''Alignment parameters'''
 +
|----
 +
| ''-n'', ''--edit-distance''=INT[%]  || (Default: ''0'')            || Maximum edit distance (read length percentage or absolute value)
 +
|----
 +
| ''-g'', ''--maxgaps''=INT[%]        || (Default: ''0'')            || GenomeMapper, BWA, GSnap, Novoalign: Maximum number of gaps (0-n) (read length percentage or absolute value)
 +
|----
 +
| ''-e'', ''--gapextension''=INT[%]  ||                              || GSnap & BWA: Maximum gap extension (0-n). -g defines max gap openings and -e max extensions per gap opening (read length percentage or absolute value)
 +
|----
 +
| ''-q'', ''--hamming''=INT[%]        ||                              || Bowtie & Novoalign: Quality-weighted hamming distance as defined by MAQ. Overwrites -n and -g. Permitted values: 0 to inf.
 +
|----
 +
| ''-l'', ''--seed''=INT[%]          ||                              || Bowtie & BWA: called seed (default: 28) - number of bases at beginning of read required to match GenomeMapper: Discard hits smaller than this seed length (read length percentage or absolute value)
 +
|----
 +
| ''-s'', ''--seed-threshold''=INT    ||                              || GenomeMapper: Discard seeds with the number of hits above this threshold
 +
|----
 +
| ''--restrict-ED''=STRING            || (Default: ''off'')          || Automatically limit edit distance according to the seed lemma (off or on or strict)
 +
|----
 +
|----
 +
| colspan=2 | '''Parallelization'''
 +
|----
 +
| ''-c'', ''--cores''=INT            || (Default: ''1'')            || Number of processors/cores
 +
|----
 +
| ''-b'', ''--batch-size''=INT        || (Default: ''50000'')        || Number of reads per thread
 +
|----
 +
| ''-M'', ''--native-cores''=INT      || (Default: ''1'')            || Use the alignment tool's internal parallelization
 +
|----
 +
|----
 +
| colspan=2 | '''Mapping strategy'''
 +
|----
 +
| ''-R'', ''--report''=INT            ||                              || Maximum reported alignments. Recommended for single-end only!
 +
|----
 +
| ''-r'', ''--suboptimal''=INT        ||                              || BWA: stop searching for suboptimal alignments when there are >INT equally best hits GSnap: All hits with best score plus suboptimal-score are reported default: no suboptimal alignments
 +
|----
 +
| ''-a'', ''--all-hit-strategy''      ||                              || GenomeMapper & Bowtie: Map against all locations within the specified alignment parameters
 +
|----
 +
| ''-2'', ''--best2-strategy''        ||                              || GenomeMapper: Report the best and the second best hit
 +
|----
 +
| ''--select-seeds''=INT[,...]        ||                              || Select from of multiple seed lengths if available in index directory
 +
|----
 +
| ''-P'', ''--upgrade''=STRING        || (Default: ''off'')          || Upgrade a previous mapflowcell run (off or replace or leftovers or full)
 +
''leftovers'' and ''full'' allow both re-alignment to the same reference sequence as used in a previous pass using a different alignment tool or parameters, as well as additionally mapping the reads to a different reference sequence.
 +
* ''off'': normal operation, will skip directories already featuring map.list, left_over.fl or ref.txt files
 +
* ''replace'': replace all previous alignment files
 +
* ''leftovers'': only align the left over reads from a previous pass of mapflowcell; the results will be merged into the existing map.list files
 +
* ''full'': try to find more alignment locations or alignments with fewer mismatches for all previously aligned (and unaligned) reads
 +
|----
 +
|----
 +
| colspan=2 | '''Spliced alignment for mRNA-seq'''
 +
|----
 +
| ''-S'', ''--spliced''              ||                              || BLAT & GSnap: Perform spliced alignments
 +
|----
 +
| ''-D'', ''--maxintron''=INT        || (Default: ''1000'')          || Max. intron length considered for spliced alignment (equals --localsplicedist in GSnap).
 +
|----
 +
| ''-L'', ''--minhit''=INT            || (Default: ''17'')            || Minimum length of hit on either side of spliced read
 +
|----
 +
|----
 +
| colspan=2 | '''Paired end sequencing'''
 +
|----
 +
| ''-p'', ''--PE''                    ||                              || Paired-end mode, generate output suitable for correct4pe or for realignment using --upgrade=full
 +
|----
 +
|----
 +
| colspan=2 | '''Output'''
 +
|----
 +
| ''-Z'', ''--nocompress-maplist''    ||                              || Do not compress mapping files
 +
|----
 +
| ''-Y'', ''--nocompress-leftover''  ||                              || Do not compress leftover files
 +
|----
 +
| ''--rplot''                        ||                              || Graphical output of statistics using R
 +
|----
 +
|}

Revision as of 14:17, 23 September 2011

shore mapflowcell performs the actual read alignments to a reference genome.

SHORE supports various alignment tools to always provide the best option for various applications. The default tool, GenomeMapper, is extensively tested. Currently the other available options are BWA, Bowtie, Novocraft and Eland.

SHORE mapflowcell will create an alignment file named map.list corresponding to each of the reads_0.fl files in the input directories.


Usage: shore mapflowcell [OPTIONS] [READ_PATHS]

Mandatory
-f, --files=STRING[,...] Shore directories (run, lane, pe or sample) or read files
-i, --index-file=STRING Fasta file in IndexFolder, *.shore file
Mapping tools
-v, --mapper=STRING (Default: genomemapper) <genomemapper> <novo> <bowtie> <eland> <bwa> <gsnap> <blat>
-C, --color BWA & Bowtie: Reads and index are in colorspace
-B, --HSO GenomeMapper: Turn on alignment of BS-seq reads. (EXPERIMENTAL)
Alignment parameters
-n, --edit-distance=INT[%] (Default: 0) Maximum edit distance (read length percentage or absolute value)
-g, --maxgaps=INT[%] (Default: 0) GenomeMapper, BWA, GSnap, Novoalign: Maximum number of gaps (0-n) (read length percentage or absolute value)
-e, --gapextension=INT[%] GSnap & BWA: Maximum gap extension (0-n). -g defines max gap openings and -e max extensions per gap opening (read length percentage or absolute value)
-q, --hamming=INT[%] Bowtie & Novoalign: Quality-weighted hamming distance as defined by MAQ. Overwrites -n and -g. Permitted values: 0 to inf.
-l, --seed=INT[%] Bowtie & BWA: called seed (default: 28) - number of bases at beginning of read required to match GenomeMapper: Discard hits smaller than this seed length (read length percentage or absolute value)
-s, --seed-threshold=INT GenomeMapper: Discard seeds with the number of hits above this threshold
--restrict-ED=STRING (Default: off) Automatically limit edit distance according to the seed lemma (off or on or strict)
Parallelization
-c, --cores=INT (Default: 1) Number of processors/cores
-b, --batch-size=INT (Default: 50000) Number of reads per thread
-M, --native-cores=INT (Default: 1) Use the alignment tool's internal parallelization
Mapping strategy
-R, --report=INT Maximum reported alignments. Recommended for single-end only!
-r, --suboptimal=INT BWA: stop searching for suboptimal alignments when there are >INT equally best hits GSnap: All hits with best score plus suboptimal-score are reported default: no suboptimal alignments
-a, --all-hit-strategy GenomeMapper & Bowtie: Map against all locations within the specified alignment parameters
-2, --best2-strategy GenomeMapper: Report the best and the second best hit
--select-seeds=INT[,...] Select from of multiple seed lengths if available in index directory
-P, --upgrade=STRING (Default: off) Upgrade a previous mapflowcell run (off or replace or leftovers or full)

leftovers and full allow both re-alignment to the same reference sequence as used in a previous pass using a different alignment tool or parameters, as well as additionally mapping the reads to a different reference sequence.

  • off: normal operation, will skip directories already featuring map.list, left_over.fl or ref.txt files
  • replace: replace all previous alignment files
  • leftovers: only align the left over reads from a previous pass of mapflowcell; the results will be merged into the existing map.list files
  • full: try to find more alignment locations or alignments with fewer mismatches for all previously aligned (and unaligned) reads
Spliced alignment for mRNA-seq
-S, --spliced BLAT & GSnap: Perform spliced alignments
-D, --maxintron=INT (Default: 1000) Max. intron length considered for spliced alignment (equals --localsplicedist in GSnap).
-L, --minhit=INT (Default: 17) Minimum length of hit on either side of spliced read
Paired end sequencing
-p, --PE Paired-end mode, generate output suitable for correct4pe or for realignment using --upgrade=full
Output
-Z, --nocompress-maplist Do not compress mapping files
-Y, --nocompress-leftover Do not compress leftover files
--rplot Graphical output of statistics using R