Difference between revisions of "SHORE Documentation"

From SHORE wiki
Jump to: navigation, search
(move intro here)
Line 1: Line 1:
 
== Introduction ==
 
== Introduction ==
=== What is SHORE ===
+
SHORE is a mapping and analysis pipeline for short DNA sequences produced on Illumina Genome Analyzer and Hiseq 2000, Life Technology SOLiD, 454 Genome Sequencer FLX  and PacBio RS platforms. It is designed for projects whose analysis strategy involves mapping of reads to a reference sequence. This reference sequence does not necessarily have to be from the same species, since weighted and gapped alignments allow for accuracy even in diverged regions.
 +
 
 +
<!-- SHORE’s mapping strategy is best-hit-only, a conservative approach, though there is no limit in the number of best hits. Paired end and mate pair information is used to increase mapping quality. Additionally SHORE provides error models for Illumina GA characteristics, e.g. GC coverage bias and error models for alignment issues in e.g. repeats or low complexity sequence.
 +
-->
 +
SHORE provides various prediction algorithms for genomic polymorphisms, i.e. SNPs, structural variants (indels, CNVs, unsequenced regions), SNPs and SV prediction in heterozygous or pooled samples, as well as peak detection for ChIP-Seq analysis and quantitative analysis of mRNA-Seq and sRNA-Seq.
 +
<!-- Future updates will incorporate QPALMA for spliced alignments and additional models for quantitative Analysis of RNA-Seq (e.g. detection of differentially expressed sRNA and mRNAloci) and ChIP-Seq and mapping and analysis of BS-seq reads. For a detailed list of future developments please check the [[roadmap|Roadmap]].
 +
-->
 +
 
 +
SHORE stores read data, alignments and result files in a predefined folder structure, which makes it possible to keep track of all intermediate steps and, if desired, repeat parts of the analysis. The predefined folder structure has advantages and disadvantages. While there is no freedom in data structuring when applying SHORE, it makes handling of large projects comprising multiple flowcells more convenient. It is in the nature of such projects that it can take weeks to gather all information, while the initial alignments have to be performed as soon as the first flowcell is finished. This requires an extendable mapping and analysis approach keeping all information in a transparent structure. Of course, also smaller projects can benefit from such a data partitioning.
 +
 
 +
SHORE was designed to run on a multi-core server (32 or 64 bit) with Linux, MacOS (10.5+) or other POSIX compliant operating systems. Required memory depends both on the application and the reference sequence. Medium sized genomes (e.g. D. melanogaster, A. thaliana) can be analyzed with 2-8GB RAM, large genomes (e.g. H. sapiens) with 8-24GB RAM. SHORE is designed to take advantage of multi-core architectures. SHORE incorporates several alignment tools (e.g. [http://www.1001genomes.org/downloads/genomemapper.html GenomeMapper], [http://bio-bwa.sourceforge.net/ bwa], [http://bowtie-bio.sourceforge.net/ bowtie], ELAND) each coming with their own hardware requirements.
  
 
== Getting help ==
 
== Getting help ==
Line 10: Line 20:
 
== Getting started ==
 
== Getting started ==
 
=== Before Using SHORE ===
 
=== Before Using SHORE ===
* [[Introduction]]
 
 
* [[System Requirements]]
 
* [[System Requirements]]
 
* [[Downloading and Installing SHORE]]
 
* [[Downloading and Installing SHORE]]

Revision as of 14:08, 20 April 2011

Introduction

SHORE is a mapping and analysis pipeline for short DNA sequences produced on Illumina Genome Analyzer and Hiseq 2000, Life Technology SOLiD, 454 Genome Sequencer FLX and PacBio RS platforms. It is designed for projects whose analysis strategy involves mapping of reads to a reference sequence. This reference sequence does not necessarily have to be from the same species, since weighted and gapped alignments allow for accuracy even in diverged regions.

SHORE provides various prediction algorithms for genomic polymorphisms, i.e. SNPs, structural variants (indels, CNVs, unsequenced regions), SNPs and SV prediction in heterozygous or pooled samples, as well as peak detection for ChIP-Seq analysis and quantitative analysis of mRNA-Seq and sRNA-Seq.

SHORE stores read data, alignments and result files in a predefined folder structure, which makes it possible to keep track of all intermediate steps and, if desired, repeat parts of the analysis. The predefined folder structure has advantages and disadvantages. While there is no freedom in data structuring when applying SHORE, it makes handling of large projects comprising multiple flowcells more convenient. It is in the nature of such projects that it can take weeks to gather all information, while the initial alignments have to be performed as soon as the first flowcell is finished. This requires an extendable mapping and analysis approach keeping all information in a transparent structure. Of course, also smaller projects can benefit from such a data partitioning.

SHORE was designed to run on a multi-core server (32 or 64 bit) with Linux, MacOS (10.5+) or other POSIX compliant operating systems. Required memory depends both on the application and the reference sequence. Medium sized genomes (e.g. D. melanogaster, A. thaliana) can be analyzed with 2-8GB RAM, large genomes (e.g. H. sapiens) with 8-24GB RAM. SHORE is designed to take advantage of multi-core architectures. SHORE incorporates several alignment tools (e.g. GenomeMapper, bwa, bowtie, ELAND) each coming with their own hardware requirements.

Getting help

Mailinglists, Papers, etc.

FAQ

TBD

Getting started

Before Using SHORE

Using SHORE


SHORE Development

How to Cite SHORE

Click on the following image to upload a new version of the PNG logo image for your project:

MediaWikiSidebarLogo.png