Difference between revisions of "SHORE Documentation"

From SHORE wiki
Jump to: navigation, search
 
(6 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
SHORE is a mapping and analysis pipeline for short DNA sequences produced on Illumina Genome Analyzer and Hiseq 2000, Life Technology SOLiD, 454 Genome Sequencer FLX  and PacBio RS platforms. It is designed for projects whose analysis strategy involves mapping of reads to a reference sequence. This reference sequence does not necessarily have to be from the same species, since weighted and gapped alignments allow for accuracy even in diverged regions.
 
SHORE is a mapping and analysis pipeline for short DNA sequences produced on Illumina Genome Analyzer and Hiseq 2000, Life Technology SOLiD, 454 Genome Sequencer FLX  and PacBio RS platforms. It is designed for projects whose analysis strategy involves mapping of reads to a reference sequence. This reference sequence does not necessarily have to be from the same species, since weighted and gapped alignments allow for accuracy even in diverged regions.
  
<!-- SHORE’s mapping strategy is best-hit-only, a conservative approach, though there is no limit in the number of best hits. Paired end and mate pair information is used to increase mapping quality. Additionally SHORE provides error models for Illumina GA characteristics, e.g. GC coverage bias and error models for alignment issues in e.g. repeats or low complexity sequence.
 
-->
 
 
SHORE provides various prediction algorithms for genomic polymorphisms, i.e. SNPs, structural variants (indels, CNVs, unsequenced regions), SNPs and SV prediction in heterozygous or pooled samples, as well as peak detection for ChIP-Seq analysis and quantitative analysis of mRNA-Seq and sRNA-Seq.
 
SHORE provides various prediction algorithms for genomic polymorphisms, i.e. SNPs, structural variants (indels, CNVs, unsequenced regions), SNPs and SV prediction in heterozygous or pooled samples, as well as peak detection for ChIP-Seq analysis and quantitative analysis of mRNA-Seq and sRNA-Seq.
<!-- Future updates will incorporate QPALMA for spliced alignments and additional models for quantitative Analysis of RNA-Seq (e.g. detection of differentially expressed sRNA and mRNAloci) and ChIP-Seq and mapping and analysis of BS-seq reads. For a detailed list of future developments please check the [[roadmap|Roadmap]].
 
-->
 
  
 
SHORE stores read data, alignments and result files in a predefined directory structure, which makes it possible to keep track of all intermediate steps and, if desired, repeat parts of the analysis. This directory hierarchy has advantages and disadvantages. While there is no freedom in data structuring when applying SHORE, it makes handling of large projects comprising multiple flowcells more convenient. It is in the nature of such projects that it can take weeks to gather all information, while the initial alignments have to be performed as soon as the first flowcell is finished. This requires an extendable mapping and analysis approach that structures all information in a transparent way. Of course, also smaller projects can benefit from such data partitioning.
 
SHORE stores read data, alignments and result files in a predefined directory structure, which makes it possible to keep track of all intermediate steps and, if desired, repeat parts of the analysis. This directory hierarchy has advantages and disadvantages. While there is no freedom in data structuring when applying SHORE, it makes handling of large projects comprising multiple flowcells more convenient. It is in the nature of such projects that it can take weeks to gather all information, while the initial alignments have to be performed as soon as the first flowcell is finished. This requires an extendable mapping and analysis approach that structures all information in a transparent way. Of course, also smaller projects can benefit from such data partitioning.
Line 12: Line 8:
 
SHORE was designed to run on a multi-core server (32 or 64 bit) with Linux, MacOS (10.5+) or other POSIX compliant operating systems. Required memory depends both on the application and the reference sequence. Medium sized genomes (e.g. D. melanogaster, A. thaliana) can be analyzed with 2-8GB RAM, large genomes (e.g. H. sapiens) with 8-24GB RAM. SHORE is designed to take advantage of multi-core architectures. SHORE incorporates several alignment tools (e.g. [http://www.1001genomes.org/downloads/genomemapper.html GenomeMapper], [http://bio-bwa.sourceforge.net/ bwa], [http://bowtie-bio.sourceforge.net/ bowtie], ELAND) each coming with their own hardware requirements.
 
SHORE was designed to run on a multi-core server (32 or 64 bit) with Linux, MacOS (10.5+) or other POSIX compliant operating systems. Required memory depends both on the application and the reference sequence. Medium sized genomes (e.g. D. melanogaster, A. thaliana) can be analyzed with 2-8GB RAM, large genomes (e.g. H. sapiens) with 8-24GB RAM. SHORE is designed to take advantage of multi-core architectures. SHORE incorporates several alignment tools (e.g. [http://www.1001genomes.org/downloads/genomemapper.html GenomeMapper], [http://bio-bwa.sourceforge.net/ bwa], [http://bowtie-bio.sourceforge.net/ bowtie], ELAND) each coming with their own hardware requirements.
  
== Getting help ==
+
== Getting Help ==
  
* [[FAQ]]
+
Please see the [[Frequently Asked Questions]] page for solutions to some common issues.
  
== Getting started ==
+
Contact information can be found on [http://1001genomes.org/software/shore.html 1001 genomes] or by running the command ''shore help''.
=== Before using SHORE ===
+
 
 +
== Getting Started ==
 +
=== Before Using SHORE ===
 
* [[System Requirements]]
 
* [[System Requirements]]
 
* [[Downloading and Installing SHORE]]
 
* [[Downloading and Installing SHORE]]
Line 24: Line 22:
 
=== Using SHORE ===
 
=== Using SHORE ===
 
* [[SHORE Overview]]
 
* [[SHORE Overview]]
* [[Running SHORE for the first time - A Quick Guide]]
+
* [[Running SHORE for the First Time - A Quick Guide]]
 
* [[SHORE Subprograms]]
 
* [[SHORE Subprograms]]
* [[SHORE file formats]]
+
* [[SHORE File Formats]]
  
 
== SHORE Development ==
 
== SHORE Development ==
 +
* [[The libshore C++ Library|Using the '' '''libshore''' '' C++ library]]
 
* [[How to Contribute]]
 
* [[How to Contribute]]
 
* [[Coding Style]]
 
* [[Coding Style]]
  
 
== How to Cite SHORE ==
 
== How to Cite SHORE ==
 +
* Ossowski S, Schneeberger K, Clark RM et al. ''Sequencing of natural strains of Arabidopsis thaliana with short reads.'' Genome Res. 2008.

Latest revision as of 17:05, 20 June 2013

Introduction

SHORE is a mapping and analysis pipeline for short DNA sequences produced on Illumina Genome Analyzer and Hiseq 2000, Life Technology SOLiD, 454 Genome Sequencer FLX and PacBio RS platforms. It is designed for projects whose analysis strategy involves mapping of reads to a reference sequence. This reference sequence does not necessarily have to be from the same species, since weighted and gapped alignments allow for accuracy even in diverged regions.

SHORE provides various prediction algorithms for genomic polymorphisms, i.e. SNPs, structural variants (indels, CNVs, unsequenced regions), SNPs and SV prediction in heterozygous or pooled samples, as well as peak detection for ChIP-Seq analysis and quantitative analysis of mRNA-Seq and sRNA-Seq.

SHORE stores read data, alignments and result files in a predefined directory structure, which makes it possible to keep track of all intermediate steps and, if desired, repeat parts of the analysis. This directory hierarchy has advantages and disadvantages. While there is no freedom in data structuring when applying SHORE, it makes handling of large projects comprising multiple flowcells more convenient. It is in the nature of such projects that it can take weeks to gather all information, while the initial alignments have to be performed as soon as the first flowcell is finished. This requires an extendable mapping and analysis approach that structures all information in a transparent way. Of course, also smaller projects can benefit from such data partitioning.

SHORE was designed to run on a multi-core server (32 or 64 bit) with Linux, MacOS (10.5+) or other POSIX compliant operating systems. Required memory depends both on the application and the reference sequence. Medium sized genomes (e.g. D. melanogaster, A. thaliana) can be analyzed with 2-8GB RAM, large genomes (e.g. H. sapiens) with 8-24GB RAM. SHORE is designed to take advantage of multi-core architectures. SHORE incorporates several alignment tools (e.g. GenomeMapper, bwa, bowtie, ELAND) each coming with their own hardware requirements.

Getting Help

Please see the Frequently Asked Questions page for solutions to some common issues.

Contact information can be found on 1001 genomes or by running the command shore help.

Getting Started

Before Using SHORE

Using SHORE

SHORE Development

How to Cite SHORE

  • Ossowski S, Schneeberger K, Clark RM et al. Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res. 2008.