Difference between revisions of "Shore import"

From SHORE wiki
Jump to: navigation, search
(Redirected page to SHORE Subprograms#shore import)
 
Line 1: Line 1:
#REDIRECT [[SHORE_Subprograms#shore_import]]
+
'''shore import'''
 +
converts Illumina GAPipeline ''BUSTARD'' directories, FASTQ files or SOLiD csfasta files into [[Read file|SHORE format]].
 +
''shore import'' will create the necessary files and the ''[[RunFolder]]'' directory structure.
 +
 
 +
Input formats of the importer are specified using option -v. Available importers are:
 +
 
 +
* Bustard: Input generated by the GAPipeline (bustard/goat) or SCS programs.
 +
* Fastq: FastQ files. Some users prefer Illumina fastq files as standard output from the GAPipeline.
 +
* Solid: SOLiD F3 and R3 csfasta and (optionally) QV files.
 +
* Shore: SHORE reads_0.fl files. This importer can be used to re-filter or trim reads which are already in SHORE format. In addition, 454 SFF files will also be accepted by this importer.
 +
 
 +
 
 +
'''Usage:''' shore import [OPTIONS]
 +
 
 +
{|
 +
| colspan=2 | '''Mandatory'''
 +
|----
 +
| ''-v'', ''--importer=<arg>''              || (Default: ''Bustard'')          || Importers: Bustard, Fastq, Shore, Solid
 +
|----
 +
| ''-e'', ''--exporter=<arg>''              || (Default: ''Shore'')            || Exporters: Shore, Console
 +
|----
 +
| ''-a'', ''--application=<arg>''          ||                                || Applications: genomic, mRNA, ChIPseq, sRNA
 +
|----
 +
|----
 +
| colspan=2 | '''Bustard importer'''
 +
|----
 +
| ''-b'', ''--bustard-folder=<arg>''        ||                                || Bustard directory, *_qseq.txt files
 +
|----
 +
| ''-l'', ''--lanes=<arg[,...]>''          || (Default: ''1,2,3,4,5,6,7,8'')  || Lanes
 +
|----
 +
|----
 +
| colspan=2 | '''Fastq importer'''
 +
|----
 +
| ''-Q'', ''--quality-type=<arg>''          || (Default: ''sanger'')          || Quality type provided in fastq files (either ''sanger'' (ASCII offset 33) or ''illumina'' (ASCII offset 64, illumina prior to CASAVA 1.8))
 +
|----
 +
| ''-x'', ''--read1-fastq=<arg[,...]>''    ||                                || List of fastq files for the first run
 +
|----
 +
| ''-y'', ''--read2-fastq=<arg[,...]>''    ||                                || List of fastq files for the second run, required for paired-end runs [NOTE: Same file order as in the -x option required]
 +
|----
 +
|----
 +
| colspan=2 | '''Shore importer'''
 +
|----
 +
| ''--input=<arg[,...]>''                  ||                                || Input reads_0.fl files or ''[[RunFolder]]s''.
 +
Note: If a complete ''RunFolder'' is specified, the ''raw data'' will be recovered, and previous filtering will be undone.
 +
|----
 +
|----
 +
| colspan=2 | '''Solid importer'''
 +
|----
 +
| ''-F'', ''--F3prefix=<arg>''              ||                                || Prefix of F3 csfasta and _QV.qual file
 +
|----
 +
| ''-R'', ''--R3prefix=<arg>''              ||                                || Prefix of R3 csfasta and _QV.qual file
 +
|----
 +
|----
 +
| colspan=2 | '''Shore exporter'''
 +
|----
 +
| ''-o'', ''--flowcell-folder=<arg>''      ||                                || ''[[RunFolder]]'', will be created
 +
|----
 +
| ''-B'', ''--batch-size=<arg>''            ||                                || Divides the ''[[LengthFolder]]s'' into batches that contain <batch-size> reads
 +
|----
 +
| ''--no-read-compression''                ||                                || Don't compress read files
 +
|----
 +
| ''--no-filtered-compression''            ||                                || Do not compress trash files
 +
|----
 +
| ''--rplot''                              ||                                || Graphical output of statistics using R
 +
|----
 +
| ''--nondestructive-trim''                ||                                || Do not truncate the ends of trimmed or clipped reads
 +
|----
 +
| ''-L'', ''--lengthdirs''                  ||                                || Always create length_ directories (created by default for sRNA only)
 +
|----
 +
|----
 +
| '''Read filtering'''
 +
|----
 +
| ''-D'', ''--disable-illumina-filter''    ||                                || Start with unfiltered reads, override the GAPipeline filter and other external filters
 +
|----
 +
| ''-n'', ''--max-Ns=<arg>''                || (Default: ''100%'')            || Maximum number of ambiguous base calls per read (percentage of trimmed read length or absolute)
 +
|----
 +
| ''-g'', ''--lowcomplexity''              ||                                || Turn on low complexity filter
 +
|----
 +
| ''-c'', ''--shore-filter''                ||                                || Use custom shore filter (implies '-D' if sig2 files are provided)
 +
|----
 +
| ''-C'', ''--chastity-violation=<arg>''    || (Default: ''57'')              || Threshold for chastity violations (in percent)
 +
|----
 +
| ''-V'', ''--quality-violation=<arg>''    || (Default: ''3'')                || Threshold for quality violations (0 to 40)
 +
|----
 +
| ''--filter-ranges=<arg[,...]>''          || (Default: ''12:2,25:5'')        || Filter setup for custom shore filter
 +
|----
 +
|----
 +
| colspan=2 | '''Read trimming'''
 +
|----
 +
| ''-m'', ''--max-length=<arg[,...]>''      ||                                || Maximum read length(s) (read length including barcode)
 +
|----
 +
| ''-k'', ''--minimal-length=<arg[,...]>''  ||                                || Minimal read length (switches on read trimming; read length without barcode)
 +
|----
 +
| ''-q'', ''--quality-cutoff=<arg>''        || (Default: ''5'')                || Quality cutoff for read trimming
 +
|----
 +
| ''--discard-trim-failures''              ||                                || Filter reads trimmed beyond minimal length.
 +
|----
 +
|----
 +
| colspan=2 | '''Read barcoding'''
 +
|----
 +
| ''-r'', ''--barcodes=<arg>''              ||                                || File with barcodes (line separated, optional second column is sample name)
 +
|----
 +
| ''-h'', ''--barcode-mismatches=<arg>''    || (Default: ''0'')                || Allowed number of mismatches in the barcodes
 +
|----
 +
| ''-w'', ''--two-sided-barcodes''          ||                                || Barcode is at both sides of the clone
 +
|----
 +
|----
 +
| colspan=2 | '''Adapter clipping (454 or application = sRNA)'''
 +
|----
 +
| ''-d'', ''--adapter-sequence=<arg>''      ||                                || Adapter sequence (please specify first 12 bp)
 +
|----
 +
| ''-s'', ''--smallest-sRNA=<arg>''        ||                                || Minimum length of sRNA to report
 +
|----
 +
| ''-t'', ''--largest-sRNA=<arg>''          ||                                || Maximum length of sRNA to report
 +
|----
 +
| ''-p'', ''--permit-missing-adapter''      ||                                || Permit reads where the adapter cannot be found
 +
|----
 +
| ''--linker=<arg>''                        ||                                || Specify linker sequence for separation of 454 PE reads
 +
|----
 +
|}

Revision as of 14:15, 23 September 2011

shore import converts Illumina GAPipeline BUSTARD directories, FASTQ files or SOLiD csfasta files into SHORE format. shore import will create the necessary files and the RunFolder directory structure.

Input formats of the importer are specified using option -v. Available importers are:

  • Bustard: Input generated by the GAPipeline (bustard/goat) or SCS programs.
  • Fastq: FastQ files. Some users prefer Illumina fastq files as standard output from the GAPipeline.
  • Solid: SOLiD F3 and R3 csfasta and (optionally) QV files.
  • Shore: SHORE reads_0.fl files. This importer can be used to re-filter or trim reads which are already in SHORE format. In addition, 454 SFF files will also be accepted by this importer.


Usage: shore import [OPTIONS]

Mandatory
-v, --importer=<arg> (Default: Bustard) Importers: Bustard, Fastq, Shore, Solid
-e, --exporter=<arg> (Default: Shore) Exporters: Shore, Console
-a, --application=<arg> Applications: genomic, mRNA, ChIPseq, sRNA
Bustard importer
-b, --bustard-folder=<arg> Bustard directory, *_qseq.txt files
-l, --lanes=<arg[,...]> (Default: 1,2,3,4,5,6,7,8) Lanes
Fastq importer
-Q, --quality-type=<arg> (Default: sanger) Quality type provided in fastq files (either sanger (ASCII offset 33) or illumina (ASCII offset 64, illumina prior to CASAVA 1.8))
-x, --read1-fastq=<arg[,...]> List of fastq files for the first run
-y, --read2-fastq=<arg[,...]> List of fastq files for the second run, required for paired-end runs [NOTE: Same file order as in the -x option required]
Shore importer
--input=<arg[,...]> Input reads_0.fl files or RunFolders.
Note: If a complete RunFolder is specified, the raw data will be recovered, and previous filtering will be undone.
Solid importer
-F, --F3prefix=<arg> Prefix of F3 csfasta and _QV.qual file
-R, --R3prefix=<arg> Prefix of R3 csfasta and _QV.qual file
Shore exporter
-o, --flowcell-folder=<arg> RunFolder, will be created
-B, --batch-size=<arg> Divides the LengthFolders into batches that contain <batch-size> reads
--no-read-compression Don't compress read files
--no-filtered-compression Do not compress trash files
--rplot Graphical output of statistics using R
--nondestructive-trim Do not truncate the ends of trimmed or clipped reads
-L, --lengthdirs Always create length_ directories (created by default for sRNA only)
Read filtering
-D, --disable-illumina-filter Start with unfiltered reads, override the GAPipeline filter and other external filters
-n, --max-Ns=<arg> (Default: 100%) Maximum number of ambiguous base calls per read (percentage of trimmed read length or absolute)
-g, --lowcomplexity Turn on low complexity filter
-c, --shore-filter Use custom shore filter (implies '-D' if sig2 files are provided)
-C, --chastity-violation=<arg> (Default: 57) Threshold for chastity violations (in percent)
-V, --quality-violation=<arg> (Default: 3) Threshold for quality violations (0 to 40)
--filter-ranges=<arg[,...]> (Default: 12:2,25:5) Filter setup for custom shore filter
Read trimming
-m, --max-length=<arg[,...]> Maximum read length(s) (read length including barcode)
-k, --minimal-length=<arg[,...]> Minimal read length (switches on read trimming; read length without barcode)
-q, --quality-cutoff=<arg> (Default: 5) Quality cutoff for read trimming
--discard-trim-failures Filter reads trimmed beyond minimal length.
Read barcoding
-r, --barcodes=<arg> File with barcodes (line separated, optional second column is sample name)
-h, --barcode-mismatches=<arg> (Default: 0) Allowed number of mismatches in the barcodes
-w, --two-sided-barcodes Barcode is at both sides of the clone
Adapter clipping (454 or application = sRNA)
-d, --adapter-sequence=<arg> Adapter sequence (please specify first 12 bp)
-s, --smallest-sRNA=<arg> Minimum length of sRNA to report
-t, --largest-sRNA=<arg> Maximum length of sRNA to report
-p, --permit-missing-adapter Permit reads where the adapter cannot be found
--linker=<arg> Specify linker sequence for separation of 454 PE reads