Bio-samtools

Introduction
bio-samtools is a Ruby binding to the popular SAMtools library, and provides access to individual read alignments as well as BAM files, reference sequence and pileup information.

Installation
Installation of bio-samtools is very straightforward, and is accomplished with the Ruby gems command. All you need is an internet connection.

Prerequisites
bio-samtools relies on the following other rubygems:

FFI bio >= 1.4.1

Once these are installed, bio-samtools can be installed with

It should then be easy to test whether installation went well. Start interactive Ruby (IRB) in the terminal, and type require 'bio-samtools' if the terminal returns true then all is well.

Creating a new SAM object
A SAM object represents the alignments in the BAM file, and is very straightforward to create, you will need a sorted BAM file, to access the alignments and a reference sequence in FASTA format to use the reference sequence. The object can be created and opened as follows:

Opening the file needs only to be done once for multiple operations on it, access to the alignments is random so you don't need to loop over the entries in the file.

Getting Reference Sequence
Retrieving the reference can only be done if the reference has been loaded, which isn't done automatically in order to save memory. Reference need only be loaded once, and is accessed using reference name, start, end in 1-based co-ordinates. A standard Ruby String object is returned.

Getting Alignments
Alignments can be obtained one at a time by looping over a specified region using the fetch function.

A separate method fetch_with_function allows you to pass a block (or a Proc object) to the function for efficient calculation. This example shows a naive conversion of the alignment object to a GFF3 object, which is stored in an array gff_list

Alignment Objects
The individual alignments represent a single read and are returned as Bio::DB::Alignment objects. These have numerous methods of their own, using require 'pp' will allow you to check the attributes contained in each object. Here is an example alignment object. Remember @ represents a Ruby instance variable and can be accessed as any other method. Thus the @is_mapped attribute of an object a is accessed a.is_mapped

Per Base Coverage
It is easy to get the total depth of reads at a given position, the chromosome_coverage function is used. This differs from the previous functions in that a start position and length (rather than end position) are passed to the function. An array of coverages is returned, the first position in the array gives the depth of coverage at the given start position in the genome, the last position in the array gives the depth of coverage at the given start position plus the length given

Average Coverage In A Region
Similarly, average (arithmetic mean) of coverage can be retrieved, also with start and length parameters

Getting Pileup Information
Pileup format represents the coverage of reads over a single base in the reference. Getting a Pileup over a region is very easy. Note that this is done with mpileup and NOT the now deprecated SAMTools pileup function. Calling the mpileup method creates an iterator that yields a Pileup object for each base.

Pileup options
The mpileup function takes a range of parameters to allow SAMTools level filtering of reads and alignments. They are specified as key =&gt; value pairs eg

Not all the options SAMTools allows you to pass to mpileup will return a Pileup object, those that cause mpileup to return BCF/VCF will be ignored. Specifically these are g,u,e,h,I,L,o,p. The table below lists the SAMTools flags supported and the symbols you can use to call them in the mpileup command.

VCF
There is an 'experimental' function, mpileup_plus, that can return a Bio::DB::Vcf object when g,u,e,h,I,L,o,p options are passed. The list below shows the symbols you can use to invoke this behaviour:


 * genotype_calling, :g
 * uncompressed_bcf, :u
 * extension_sequencing_probability, :e
 * homopolymer_error_coefficient, :h
 * no_indels, :I
 * skip_indel_over_average_depth, :L
 * gap_open_sequencing_error_probability,:o
 * platforms, :P

Tests
The easiest way to run the built-in unit tests is to change to the bio-samtools installation directory (discoverable by typing 'gem which bio-samtools' at the command line) and running the separate test files individually.

Each test file tests different aspects of the gem.

test_basic.rb - tests the general functionality of the gem, such as opening and closing BAM, creating indices, fetching regions, checks the correct Pileup format is returned when requested etc.  test_pileup.rb - tests the Pileup class, making sure attributes are set correctly when Pileup data are passed test_vcf.rb - tests the Vcf class, making sure attributes are set correctly when Vcf data are passed

Documentation
Further RDoc is available here: http://rubydoc.info/gems/bio-samtools/0.5.2/frames