Difference between revisions of "Shore import"

From SHORE wiki
Jump to: navigation, search
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 
'''shore import'''
 
'''shore import'''
converts filters reads from various sources and converts them into [[Read file|SHORE format]].
+
processes and filters reads from various sources and converts them into [[Read file|SHORE format]].
 
''shore import'' will create the necessary files and the ''[[RunFolder]]'' directory structure.
 
''shore import'' will create the necessary files and the ''[[RunFolder]]'' directory structure.
  
Line 14: Line 14:
 
* ''-v Fastq'': FastQ files. Some users prefer Illumina fastq files as standard output from the GAPipeline.
 
* ''-v Fastq'': FastQ files. Some users prefer Illumina fastq files as standard output from the GAPipeline.
 
* ''-v Solid'': SOLiD F3 and R3 csfasta and (optionally) QV files.
 
* ''-v Solid'': SOLiD F3 and R3 csfasta and (optionally) QV files.
* ''-v Shore'': SHORE [[reads_0.fl]] files. This importer can be used to re-filter or trim reads which are already in SHORE format. In addition, 454 SFF files will also be accepted by this importer.
+
* ''-v Shore'': SHORE [[FlatRead]] files or entire [[RunFolder]]s or [[LaneFolder]]s. This importer can be used to re-filter or trim reads which are already in SHORE format. In addition, 454 SFF files will also be accepted by this importer.
  
 
==Command line options==
 
==Command line options==

Latest revision as of 14:53, 27 May 2013

shore import processes and filters reads from various sources and converts them into SHORE format. shore import will create the necessary files and the RunFolder directory structure.

Reads may be imported from Illumina GAPipeline BUSTARD directories, FASTQ files or SOLiD csfasta files.

Input formats

Input formats of the importer are specified using option -v.

Available options are:

  • -v Bustard: Input generated by the GAPipeline (bustard/goat) or SCS programs.
  • -v Fastq: FastQ files. Some users prefer Illumina fastq files as standard output from the GAPipeline.
  • -v Solid: SOLiD F3 and R3 csfasta and (optionally) QV files.
  • -v Shore: SHORE FlatRead files or entire RunFolders or LaneFolders. This importer can be used to re-filter or trim reads which are already in SHORE format. In addition, 454 SFF files will also be accepted by this importer.

Command line options

Usage: shore import [OPTIONS]

Mandatory
-v, --importer=<arg> (Default: Bustard) Importers: Bustard, Fastq, Shore, Solid
-e, --exporter=<arg> (Default: Shore) Exporters: Shore, Console
-a, --application=<arg> Applications: genomic, mRNA, ChIPseq, sRNA
Bustard importer
-b, --bustard-folder=<arg> Bustard directory, *_qseq.txt files
-l, --lanes=<arg[,...]> (Default: 1,2,3,4,5,6,7,8) Lanes
Fastq importer
-Q, --quality-type=<arg> (Default: sanger) Quality type provided in fastq files (either sanger (ASCII offset 33) or illumina (ASCII offset 64, illumina prior to CASAVA 1.8))
-x, --read1-fastq=<arg[,...]> List of fastq files for the first run
-y, --read2-fastq=<arg[,...]> List of fastq files for the second run, required for paired-end runs [NOTE: Same file order as in the -x option required]
Shore importer
--input=<arg[,...]> Input reads_0.fl files or RunFolders.
Note: If a complete RunFolder is specified, the raw data will be recovered, and previous filtering will be undone.
Solid importer
-F, --F3prefix=<arg> Prefix of F3 csfasta and _QV.qual file
-R, --R3prefix=<arg> Prefix of R3 csfasta and _QV.qual file
Shore exporter
-o, --flowcell-folder=<arg> RunFolder, will be created
-B, --batch-size=<arg> Divides the LengthFolders into batches that contain <batch-size> reads
--no-read-compression Don't compress read files
--no-filtered-compression Do not compress trash files
--rplot Graphical output of statistics using R
--nondestructive-trim Do not truncate the ends of trimmed or clipped reads
-L, --lengthdirs Always create length_ directories (created by default for sRNA only)
Read filtering
-D, --disable-illumina-filter Start with unfiltered reads, override the GAPipeline filter and other external filters
-n, --max-Ns=<arg> (Default: 100%) Maximum number of ambiguous base calls per read (percentage of trimmed read length or absolute)
-g, --lowcomplexity Turn on low complexity filter
-c, --shore-filter Use custom shore filter (implies '-D' if sig2 files are provided)
-C, --chastity-violation=<arg> (Default: 57) Threshold for chastity violations (in percent)
-V, --quality-violation=<arg> (Default: 3) Threshold for quality violations (0 to 40)
--filter-ranges=<arg[,...]> (Default: 12:2,25:5) Filter setup for custom shore filter
Read trimming
-m, --max-length=<arg[,...]> Maximum read length(s) (read length including barcode)
-k, --minimal-length=<arg[,...]> Minimal read length (switches on read trimming; read length without barcode)
-q, --quality-cutoff=<arg> (Default: 5) Quality cutoff for read trimming
--discard-trim-failures Filter reads trimmed beyond minimal length.
Read barcoding
-r, --barcodes=<arg> File with barcodes (line separated, optional second column is sample name)
-h, --barcode-mismatches=<arg> (Default: 0) Allowed number of mismatches in the barcodes
-w, --two-sided-barcodes Barcode is at both sides of the clone
Adapter clipping (454 or application = sRNA)
-d, --adapter-sequence=<arg> Adapter sequence (please specify first 12 bp)
-s, --smallest-sRNA=<arg> Minimum length of sRNA to report
-t, --largest-sRNA=<arg> Maximum length of sRNA to report
-p, --permit-missing-adapter Permit reads where the adapter cannot be found
--linker=<arg> Specify linker sequence for separation of 454 PE reads