Next Generation Sequencing

Main Idea
Ideas for a BioRuby Plugin to handle Next Generation Sequencing data, in particular RNA-seq data.

Repository
Git Repository is: https://github.com/helios/bioruby-ngs

Tools supported
The bio-ngs plugin will be used as a container for others NGS plugins that will provide specific wrappers or bindings to existing tools. Here is a first list:


 * bio-bwa Burrows-Wheeler Aligner
 * bio-picard Picard
 * comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files.
 * bio-samtools SAM (Sequence Alignment/Map)
 * SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments.
 * bio-qseq TODO convert qseq file in fastq format.

and will include graphics libraries like Rubyvis (http://rubyvis.rubyforge.org/) to generate reports on data quality, mapping results and other related statistics.

The main idea is to wrap NGS standard tools into Ruby and where possible to include direct binding for these tools.

This could be done for example for Picard via JRuby and for SAMtools using samtools-ruby (https://github.com/homonecloco/samtools-ruby). Every option needs to be tested to ensure a good performance in handling large datasets.

bio-samtools
bio-samtools is a Ruby binding to the popular SAMtools library, and provides access to individual read alignments as well as BAM files, reference sequence and pileup information.

Source code is available on GitHub at https://github.com/helios/bioruby-samtools.

Tutorial is available here: Bio-samtools

bio-bwa

 * create a BWA shared library for Linux and Mac OS X: DONE
 * create a BioRuby plugin with binding to BWA: DONE. The code is available at https://github.com/fstrozzi/bioruby-bwa
 * perform a real test to check the Ruby binding: DONE. Details available at https://github.com/fstrozzi/bioruby-bwa/wiki
 * run a test phase to check if pre-compiled shared libraries work fine everywhere: TODO

NGS Workflows
see also Workflows

Using Rake or Thor to run NGS analyses

The bio-ngs plugin will implement a flexible Rake task system similar to Rails, where custom tasks can be defined according to specific needs. As an alternative, Thor could be used instead of Rake (https://github.com/wycats/thor).

This will allow bio-ngs users to perform NGS analyses and pipelines directly using Rake and the functionalities provided by BioRuby and the others Bio* plugins.

Please add the people involved on this topic.

Active developers
so far:


 * bio-samtools: Raoul Bonnal


 * bio-bwa: Francesco Strozzi