An example
Preprocessing the reference
SHORE version 0.8
> shore preprocess \
-f ~/downloads/phiX.reference_sequence.fa \
-i ~/shore_index/phiX \
-W -b -C
-f |
defines the reference sequence
|
-i |
describes the folder hosting the shore IndexFolder
|
-W |
additionally create BWA index
|
-b |
additionally create Bowtie index
|
-C |
additionally create colorspace index (available for bwa and bowtie)
|
SHORE version 0.9
> shore preprocess \
-f ~/downloads/phiX.reference_sequence.fa \
-i ~/shore_index/phiX \
-x BWA,Bowtie2 -C
-x BWA,Bowtie2 |
additionally create BWA and Bowtie2 indexes
|
Importing read data into SHORE
Importing reads in Fastq format
> shore import -v Fastq -e Shore \
-a genomic -x read1.fastq -x read2.fastq \
-i PhiX-1 -o ~/phiX_resequencing_project/run_01
-v
|
defines the input format
|
-e
|
the output format
|
-a
|
starts a genomic re-sequencing project
|
-i
|
an arbitrary, unique identifier for the sample
|
-x
|
files for the first and the second sequencing read, in the same order
|
-o
|
output directory
|
Importing Illumina GA-Pipeline Bustard directories
> shore import -v Bustard -e Shore \
-a genomic -b my/bustard/folder \
-i PhiX-2 -o ~/phiX_resequencing_project/run_02
-v
|
defines the input format
|
-e
|
the output format
|
-a
|
starts a genomic re-sequencing project
|
-i
|
an arbitrary, unique identifier for the sample
|
-b
|
bustard folder produced by GA-Pipeline
|
-o
|
output directory
|
Importing SOLiD csfasta and QV files
> shore import -v Solid -e Shore \
-a genomic -F reads_F3 -R reads_R3 \
-i PhiX-3 -o ~/phiX_resequencing_project/run_03
-v
|
defines the input format
|
-e
|
the output format
|
-a
|
starts a genomic re-sequencing project
|
-F
|
prefix of the F3 csfasta file
|
-R
|
prefix of the R3 csfasta file
|
-i
|
an arbitrary, unique identifier for the sample
|
-o
|
output directory
|
Recovering raw read data in FASTQ format from a SHORE directory
> shore import -i "" -P ~/phiX_resequencing_project/run_01 | shore convert Reads2Fastq - PhiX-1.fq.gz
-i ""
|
unset sample identifier
|
-P
|
SHORE RunFolder or LaneFolder from which data should be recovered
|
Reads2Fastq
|
convert output to FASTQ format
|
-
|
read from standard input
|
PhiX-1.fq.gz
|
write GZIP compressed output file
|
Mapping reads of a SHORE run directory
Illumina
> shore mapflowcell -f ~/phiX_resequencing_project/run_01 \
-i ~/shore_index/phiX/phiX.reference_sequence.fa.shore \
-n 10% -g 6% -c 8 -p
-f
|
run directory / files
|
-n
|
max edit distance, optionally as percentage of read length
|
-g
|
max gaps, optionally as percentage of read length
|
-c
|
number of CPUs
|
-p
|
indicate paired end data
|
-i
|
*.shore file in the IndexFolder.
|
SOLiD
> shore mapflowcell -f ~/phiX_resequencing_project/run_01 \
-i ~/shore_index/phiX/phiX.reference_sequence.fa.shore \
-v bwa -C \
-n 4 -g 3 -c 8 -p
-f
|
run directory / files
|
-v
|
name of the mapper
|
-C
|
indicate that reads are in colorspace
|
-n
|
max edit distance
|
-g
|
max gaps
|
-c
|
number of CPUs
|
-p
|
indicate paired end data
|
-i
|
*.shore file in the IndexFolder.
|
Correcting for paired-end information
> shore correct4pe -l ~/phiX_resequencing_project/run_01/7 -x 250 -D PE
-l
|
lane
|
-x
|
expected insert size
|
-D PE
|
indicate Illumina paired-end (as opposed to mate pair) libraries.
|
Consensus analysis
> shore consensus -n phiX \
-f ~/shore_index/phiX/phiX.reference_sequence.fa.shore \
-o ~/phiX_resequencing_project/Analysis_01 \
-i ~/phiX_resequencing_project/run_01 -i ~/phiX_resequencing_project/run_02 \
-a /usr/local/share/shore/scoring_matrix_hom.txt \
-b 0.51 -r
-n
|
sample name
|
-f
|
*.shore file in the index folder
|
-o
|
output folder
|
-i
|
RunFolders, LaneFolders or merged MapList files
|
-a
|
scoring matrix
|
-b
|
minimum agreement of base calls (here for homozygous samples).
|
-r
|
plot statistics using R
|
Checking the result files
Run statistics plots can be found in
~/phiX_resequencing_project/Analysis_01/ConsensusStatistics
Analysis result files can be found in
~/phiX_resequencing_project/Analysis_01/ConsensusAnalysis
Have a look at ’quality variants.txt’ and ’quality reference.txt’. The result files and formats will be explained in more detail in a later chapter. If you have R installed and included in your $PATH variable, you can find two PDFs in
~/phiX_resequencing_project/Analysis_01/ConsensusStatistics
One shows the GC content dependent coverage, the other one plots read errors by base quality in various plots. Find more details on run statistics plots in a later section.