Shore import
From SHORE wiki
shore import converts filters reads from various sources and converts them into SHORE format. shore import will create the necessary files and the RunFolder directory structure.
Reads may be imported from Illumina GAPipeline BUSTARD directories, FASTQ files or SOLiD csfasta files.
Input formats
Input formats of the importer are specified using option -v.
Available options are:
- -v Bustard: Input generated by the GAPipeline (bustard/goat) or SCS programs.
- -v Fastq: FastQ files. Some users prefer Illumina fastq files as standard output from the GAPipeline.
- -v Solid: SOLiD F3 and R3 csfasta and (optionally) QV files.
- -v Shore: SHORE reads_0.fl files. This importer can be used to re-filter or trim reads which are already in SHORE format. In addition, 454 SFF files will also be accepted by this importer.
Command line options
Usage: shore import [OPTIONS]
Mandatory | ||
-v, --importer=<arg> | (Default: Bustard) | Importers: Bustard, Fastq, Shore, Solid |
-e, --exporter=<arg> | (Default: Shore) | Exporters: Shore, Console |
-a, --application=<arg> | Applications: genomic, mRNA, ChIPseq, sRNA | |
Bustard importer | ||
-b, --bustard-folder=<arg> | Bustard directory, *_qseq.txt files | |
-l, --lanes=<arg[,...]> | (Default: 1,2,3,4,5,6,7,8) | Lanes |
Fastq importer | ||
-Q, --quality-type=<arg> | (Default: sanger) | Quality type provided in fastq files (either sanger (ASCII offset 33) or illumina (ASCII offset 64, illumina prior to CASAVA 1.8)) |
-x, --read1-fastq=<arg[,...]> | List of fastq files for the first run | |
-y, --read2-fastq=<arg[,...]> | List of fastq files for the second run, required for paired-end runs [NOTE: Same file order as in the -x option required] | |
Shore importer | ||
--input=<arg[,...]> | Input reads_0.fl files or RunFolders.
Note: If a complete RunFolder is specified, the raw data will be recovered, and previous filtering will be undone. | |
Solid importer | ||
-F, --F3prefix=<arg> | Prefix of F3 csfasta and _QV.qual file | |
-R, --R3prefix=<arg> | Prefix of R3 csfasta and _QV.qual file | |
Shore exporter | ||
-o, --flowcell-folder=<arg> | RunFolder, will be created | |
-B, --batch-size=<arg> | Divides the LengthFolders into batches that contain <batch-size> reads | |
--no-read-compression | Don't compress read files | |
--no-filtered-compression | Do not compress trash files | |
--rplot | Graphical output of statistics using R | |
--nondestructive-trim | Do not truncate the ends of trimmed or clipped reads | |
-L, --lengthdirs | Always create length_ directories (created by default for sRNA only) | |
Read filtering | ||
-D, --disable-illumina-filter | Start with unfiltered reads, override the GAPipeline filter and other external filters | |
-n, --max-Ns=<arg> | (Default: 100%) | Maximum number of ambiguous base calls per read (percentage of trimmed read length or absolute) |
-g, --lowcomplexity | Turn on low complexity filter | |
-c, --shore-filter | Use custom shore filter (implies '-D' if sig2 files are provided) | |
-C, --chastity-violation=<arg> | (Default: 57) | Threshold for chastity violations (in percent) |
-V, --quality-violation=<arg> | (Default: 3) | Threshold for quality violations (0 to 40) |
--filter-ranges=<arg[,...]> | (Default: 12:2,25:5) | Filter setup for custom shore filter |
Read trimming | ||
-m, --max-length=<arg[,...]> | Maximum read length(s) (read length including barcode) | |
-k, --minimal-length=<arg[,...]> | Minimal read length (switches on read trimming; read length without barcode) | |
-q, --quality-cutoff=<arg> | (Default: 5) | Quality cutoff for read trimming |
--discard-trim-failures | Filter reads trimmed beyond minimal length. | |
Read barcoding | ||
-r, --barcodes=<arg> | File with barcodes (line separated, optional second column is sample name) | |
-h, --barcode-mismatches=<arg> | (Default: 0) | Allowed number of mismatches in the barcodes |
-w, --two-sided-barcodes | Barcode is at both sides of the clone | |
Adapter clipping (454 or application = sRNA) | ||
-d, --adapter-sequence=<arg> | Adapter sequence (please specify first 12 bp) | |
-s, --smallest-sRNA=<arg> | Minimum length of sRNA to report | |
-t, --largest-sRNA=<arg> | Maximum length of sRNA to report | |
-p, --permit-missing-adapter | Permit reads where the adapter cannot be found | |
--linker=<arg> | Specify linker sequence for separation of 454 PE reads |