An example

Preprocessing the reference

SHORE version 0.8

> shore preprocess \
   -f ~/downloads/phiX.reference_sequence.fa \
   -i ~/shore_index/phiX \
   -W -b -C

`-f`	defines the reference sequence
`-i`	describes the folder hosting the shore IndexFolder
`-W`	additionally create BWA index
`-b`	additionally create Bowtie index
`-C`	additionally create colorspace index (available for bwa and bowtie)

SHORE version 0.9

> shore preprocess \
   -f ~/downloads/phiX.reference_sequence.fa \
   -i ~/shore_index/phiX \
   -x BWA,Bowtie2 -C

-x BWA,Bowtie2 additionally create BWA and Bowtie2 indexes

Importing read data into SHORE

Importing reads in Fastq format

> shore import -v Fastq -e Shore \
   -a genomic -x read1.fastq -x read2.fastq \
   -i PhiX-1 -o ~/phiX_resequencing_project/run_01

`-v`	defines the input format
`-e`	the output format
`-a`	starts a genomic re-sequencing project
`-i`	an arbitrary, unique identifier for the sample
`-x`	files for the first and the second sequencing read, in the same order
`-o`	output directory

Importing Illumina GA-Pipeline Bustard directories

> shore import -v Bustard -e Shore \
   -a genomic -b my/bustard/folder \
   -i PhiX-2 -o ~/phiX_resequencing_project/run_02

`-v`	defines the input format
`-e`	the output format
`-a`	starts a genomic re-sequencing project
`-i`	an arbitrary, unique identifier for the sample
`-b`	bustard folder produced by GA-Pipeline
`-o`	output directory

Importing SOLiD csfasta and QV files

> shore import -v Solid -e Shore \
   -a genomic -F reads_F3 -R reads_R3 \
   -i PhiX-3 -o ~/phiX_resequencing_project/run_03

`-v`	defines the input format
`-e`	the output format
`-a`	starts a genomic re-sequencing project
`-F`	prefix of the F3 csfasta file
`-R`	prefix of the R3 csfasta file
`-i`	an arbitrary, unique identifier for the sample
`-o`	output directory

Recovering raw read data in FASTQ format from a SHORE directory

> shore import -i "" -P ~/phiX_resequencing_project/run_01 | shore convert Reads2Fastq - PhiX-1.fq.gz

`-i ""`	unset sample identifier
`-P`	SHORE RunFolder or LaneFolder from which data should be recovered
`Reads2Fastq`	convert output to FASTQ format
`-`	read from standard input
`PhiX-1.fq.gz`	write GZIP compressed output file

Mapping reads of a SHORE run directory

Illumina

> shore mapflowcell -f ~/phiX_resequencing_project/run_01 \
   -i ~/shore_index/phiX/phiX.reference_sequence.fa.shore \
   -n 10% -g 6% -c 8 -p

`-f`	run directory / files
`-n`	max edit distance, optionally as percentage of read length
`-g`	max gaps, optionally as percentage of read length
`-c`	number of CPUs
`-p`	indicate paired end data
`-i`	*.shore file in the IndexFolder.

SOLiD

> shore mapflowcell -f ~/phiX_resequencing_project/run_01 \
   -i ~/shore_index/phiX/phiX.reference_sequence.fa.shore \
   -v bwa -C \
   -n 4 -g 3 -c 8 -p

`-f`	run directory / files
`-v`	name of the mapper
`-C`	indicate that reads are in colorspace
`-n`	max edit distance
`-g`	max gaps
`-c`	number of CPUs
`-p`	indicate paired end data
`-i`	*.shore file in the IndexFolder.

Correcting for paired-end information

> shore correct4pe -l ~/phiX_resequencing_project/run_01/7 -x 250 -D PE

`-l`	lane
`-x`	expected insert size
`-D PE`	indicate Illumina paired-end (as opposed to mate pair) libraries.

Consensus analysis

> shore consensus -n phiX \
   -f ~/shore_index/phiX/phiX.reference_sequence.fa.shore \
   -o ~/phiX_resequencing_project/Analysis_01 \
   -i ~/phiX_resequencing_project/run_01 -i ~/phiX_resequencing_project/run_02 \
   -a /usr/local/share/shore/scoring_matrix_hom.txt \
   -b 0.51 -r

`-n`	sample name
`-f`	*.shore file in the index folder
`-o`	output folder
`-i`	RunFolders, LaneFolders or merged MapList files
`-a`	scoring matrix
`-b`	minimum agreement of base calls (here for homozygous samples).
`-r`	plot statistics using R

Checking the result files

Run statistics plots can be found in

~/phiX_resequencing_project/Analysis_01/ConsensusStatistics

Analysis result files can be found in

~/phiX_resequencing_project/Analysis_01/ConsensusAnalysis

Have a look at ’quality variants.txt’ and ’quality reference.txt’. The result files and formats will be explained in more detail in a later chapter. If you have R installed and included in your $PATH variable, you can find two PDFs in

~/phiX_resequencing_project/Analysis_01/ConsensusStatistics

One shows the GC content dependent coverage, the other one plots read errors by base quality in various plots. Find more details on run statistics plots in a later section.

Running SHORE for the First Time - A Quick Guide

Contents

An example

Preprocessing the reference

SHORE version 0.8

SHORE version 0.9

Importing read data into SHORE

Importing reads in Fastq format

Importing Illumina GA-Pipeline Bustard directories

Importing SOLiD csfasta and QV files

Recovering raw read data in FASTQ format from a SHORE directory

Mapping reads of a SHORE run directory

Illumina

SOLiD

Correcting for paired-end information

Consensus analysis

Checking the result files

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools