Running SHORE for the First Time - A Quick Guide

From SHORE wiki
Jump to: navigation, search

An example

Preprocessing the reference

SHORE version 0.8

> shore preprocess \
   -f ~/downloads/phiX.reference_sequence.fa \
   -i ~/shore_index/phiX \
   -W -b -C
-f defines the reference sequence
-i describes the folder hosting the shore IndexFolder
-W additionally create BWA index
-b additionally create Bowtie index
-C additionally create colorspace index (available for bwa and bowtie)

SHORE version 0.9

> shore preprocess \
   -f ~/downloads/phiX.reference_sequence.fa \
   -i ~/shore_index/phiX \
   -x BWA,Bowtie2 -C
-x BWA,Bowtie2 additionally create BWA and Bowtie2 indexes

Importing read data into SHORE

Importing reads in Fastq format

> shore import -v Fastq -e Shore \
   -a genomic -x read1.fastq -x read2.fastq \
   -i PhiX-1 -o ~/phiX_resequencing_project/run_01
-v defines the input format
-e the output format
-a starts a genomic re-sequencing project
-i an arbitrary, unique identifier for the sample
-x files for the first and the second sequencing read, in the same order
-o output directory

Importing Illumina GA-Pipeline Bustard directories

> shore import -v Bustard -e Shore \
   -a genomic -b my/bustard/folder \
   -i PhiX-2 -o ~/phiX_resequencing_project/run_02
-v defines the input format
-e the output format
-a starts a genomic re-sequencing project
-i an arbitrary, unique identifier for the sample
-b bustard folder produced by GA-Pipeline
-o output directory

Importing SOLiD csfasta and QV files

> shore import -v Solid -e Shore \
   -a genomic -F reads_F3 -R reads_R3 \
   -i PhiX-3 -o ~/phiX_resequencing_project/run_03
-v defines the input format
-e the output format
-a starts a genomic re-sequencing project
-F prefix of the F3 csfasta file
-R prefix of the R3 csfasta file
-i an arbitrary, unique identifier for the sample
-o output directory

Recovering raw read data in FASTQ format from a SHORE directory

> shore import -i "" -P ~/phiX_resequencing_project/run_01 | shore convert Reads2Fastq - PhiX-1.fq.gz
-i "" unset sample identifier
-P SHORE RunFolder or LaneFolder from which data should be recovered
Reads2Fastq convert output to FASTQ format
- read from standard input
PhiX-1.fq.gz write GZIP compressed output file

Mapping reads of a SHORE run directory

Illumina

> shore mapflowcell -f ~/phiX_resequencing_project/run_01 \
   -i ~/shore_index/phiX/phiX.reference_sequence.fa.shore \
   -n 10% -g 6% -c 8 -p
-f run directory / files
-n max edit distance, optionally as percentage of read length
-g max gaps, optionally as percentage of read length
-c number of CPUs
-p indicate paired end data
-i *.shore file in the IndexFolder.

SOLiD

> shore mapflowcell -f ~/phiX_resequencing_project/run_01 \
   -i ~/shore_index/phiX/phiX.reference_sequence.fa.shore \
   -v bwa -C \
   -n 4 -g 3 -c 8 -p
-f run directory / files
-v name of the mapper
-C indicate that reads are in colorspace
-n max edit distance
-g max gaps
-c number of CPUs
-p indicate paired end data
-i *.shore file in the IndexFolder.

Correcting for paired-end information

> shore correct4pe -l ~/phiX_resequencing_project/run_01/7 -x 250 -D PE
-l lane
-x expected insert size
-D PE indicate Illumina paired-end (as opposed to mate pair) libraries.

Consensus analysis

> shore consensus -n phiX \
   -f ~/shore_index/phiX/phiX.reference_sequence.fa.shore \
   -o ~/phiX_resequencing_project/Analysis_01 \
   -i ~/phiX_resequencing_project/run_01 -i ~/phiX_resequencing_project/run_02 \
   -a /usr/local/share/shore/scoring_matrix_hom.txt \
   -b 0.51 -r
-n sample name
-f *.shore file in the index folder
-o output folder
-i RunFolders, LaneFolders or merged MapList files
-a scoring matrix
-b minimum agreement of base calls (here for homozygous samples).
-r plot statistics using R

Checking the result files

Run statistics plots can be found in

~/phiX_resequencing_project/Analysis_01/ConsensusStatistics

Analysis result files can be found in

~/phiX_resequencing_project/Analysis_01/ConsensusAnalysis

Have a look at ’quality variants.txt’ and ’quality reference.txt’. The result files and formats will be explained in more detail in a later chapter. If you have R installed and included in your $PATH variable, you can find two PDFs in

~/phiX_resequencing_project/Analysis_01/ConsensusStatistics

One shows the GC content dependent coverage, the other one plots read errors by base quality in various plots. Find more details on run statistics plots in a later section.