Google Summer of Code

Introduction
From 2009-2012 the Open Bioinformatics foundation BioRuby project successfully participated in the Google Summer of Code, the first year under the wings of Nescent, and from 2010-2012 as part of the Open Bioinformatics Foundation. Sadly, in 2013, the OBF was not accepted as a mentoring organisation. The SciRuby organisation, however, will merge in our projects ideas! Please go to that page to become a GSoC student. BioRuby mentors will be there.

We still look for students who want to work on one of below projects over summer. Feel free to inquire on the ML, or directly with below names!

In earlier years a number of BioRuby projects were accepted under the OBF umbrella (see below). Two others under the NESCENT mentoring organisation.

Please read the GSoC page at the Open Bioinformatics Foundation and the main Google Summer of Code page for more details about the program.

Mentor List
BioRuby panel members who currently volunteer to act as a mentor to a GSoC student.


 * Raoul J.P. Bonnal - r -at- bioruby -dot- org
 * Francesco Strozzi -
 * Pjotr Prins
 * Naohisa Goto
 * Toshiaki Katayama

An ultra-fast scalable RESTful API to query large numbers of genomic variations

 * Rationale


 * VCF files are the typical output of genome resequencing projects (http://www.1000genomes.org/node/101). They store the information on all the mutations and variations (SNPs and InDels) that are found by comparing the outputs of a NGS platform with a reference genome. These files are not incredibly large (a typical uncompressed VCF file is few gigabytes) but they are full with information on millions of positions in the genome where mutations are found. Large resequencing projects can produce hundreds or thousands of these files, one for each sample sequenced.
 * Existing tools (such as VCFTools or BCFTools) offer a convenient way to access these files and extract or convert the information present, but are limited in functionalities and speed when more complex queries need to be performed on these data. With existing tools it is very complicated, if not impossibile, to retrive information when working on many VCF files and samples together to compare, for instance, the variations found in 100 samples and extract all the mutations that are present in 50 samples but are not present in the other 50 and so on.


 * Approach
 * The project should develop a RESTful API to address the issues described in the rationale and to allow users to manipulate and compare genomics variation information for hundreds of samples. A database engine will be required to store the information and to support the data mining. Unstructured database engines such as noSQL databases or key-values stores can all be valid alternatives to combine high-speed with data flexibility. The decision on the best database engine to be used will be discussed between the student and the mentors and within the OpenBio community. Given the high amount of information that will need to be processed by such an application, scalable and fast languages such as JVM-based languages like Scala or JRuby will be a good choice. The project should also take care of the deploy of such an API, by creating a Ruby gem or a JAR that users can install and use right away with their datasets.


 * Difficulty and needed skills
 * The project has an average difficulty and it is aimed at talented students who wants to develop a fast API to address these problems.


 * The project requires
 * Knowledge of advanced programming languages. Some experience and knowledge of databases and data mining will help managing the information of VCF files.


 * Mentors
 * Francesco Strozzi, Raoul J.P. Bonnal

2013
The BioRuby proposals for 2013 are listed here and linked from the Open Bioinformatics Foundation.

D3 based graphics package for Bioinformatics

 * Rationale
 * D3 is an incredible interactive data visualisation library written in Javascript that runs in a browser. We want to port special visualisations for Bioinformatics related to genome displays, phylogeny, QTL mapping, etc. as well as figures for statistics for the SciRuby library. Based on existing work in the Ruby bio-graphics and RubyD3 gems, R/qtl, and work done for genometools and the JBrowser, we would like to create a graphics generator that allows for embedding interactive Javascript hooks. The immediate task is to create zoomable interactive figures for a genetic map, pairwise recombination fractions, image of genotype data, LOD curves, 2d scans and QTL effects.


 * Approach
 * Create a project that takes all the good ideas from other projects (such as Matplotlib, JBrowse, Sequenceservere and Bio::Graphics), and build up a sustainable source base for current and future work. The generator should be written in Ruby and generate suitable D3 code in either Javascript or Coffeescript. Even though the generator is written in Ruby, the functionality should be easily accessible from other programming languages, perhaps by settling on an intermediate representation of data and code


 * Difficulty and needed skills
 * Affinity for design and graphics, accessibility of information, web programming in Ruby. Javascript/Coffeescript


 * The project requires
 * Some Ruby experience, interest in web design and scientific graphics. Creating a useful package will be a real challenge


 * Mentors
 * Raoul J.P. Bonnal, Rob Syme, Karl Broman, Rob Buels (confirm)

Semantic web/RDF support for Bioinformatics

 * Rationale
 * The bioinformatics community is doing a lot of work integrating different data repositories through RDF. For example Bio2RDF and SADI. A list of activities can be found here. BioRuby and biogems contain a wide range of parsers and formatters which could be extended to support reading and writing RDF. Having such functionality would make it easy for bioinformaticians to incorporate and expose RDF for flexible data queries.


 * Approach
 * We will visit all existing parsers and formatters and decide which ones are most useful for RDF import/export. The student will tackle one transformer at a time, writing tests and adding a SPARQL end point for others to use. The student will also add SADI service discovery.


 * Difficulty and needed skills
 * Average difficulty


 * The project requires
 * The student will need to have affinity with the semantic web and get to a decent level op Ruby programming. Probably includes meta-programming.


 * Mentors
 * Toshiaki Katayama, Mark Wilkinson (confirm), Jerven Bolleman

Machine Learning & Data Mining Algorithms for Ruby

 * Rationale
 * Machine learning and data mining algorithms are widely employed for the statistical analyses performed on biological datasets. Many Java libraries currently exist that implement the most commonly used algorithms in bioinformatics (such as clustering methods and simple classifiers), but the usability of these tools is restricted by the limited supply of APIs and user-friendly implementations for languages other than Java.


 * Approach
 * The goal of this project would be to implement a system to easily access these set of tools using Jruby and to develop a basic framework that integrates the different sources. The Java libraries that could be primarily used would be taken from Weka (http://www.cs.waikato.ac.nz/ml/weka/) and RapidMiner (http://rapid-i.com/content/view/181/190/). This approach could be subsequently extended to develop a visualization scheme based on D3.


 * Difficulty and needed skills
 * Medium/Hard depending on the topic selected and the scope of the project
 * Basic statistical knowledge is required as well as programming in Ruby, JRuby and Java


 * The project requires
 * Basic statistical knowledge,Ruby,JRuby,Java


 * Mentors
 * Raoul J.P. Bonnal, Francesco Strozzi

Create a dynamic and social web portal for Bioinformatics packages

 * Rationale
 * The http://biogems.info/ website is an aggregator of Ruby gems and Debian/Biolinux package information. We are looking at building up similar aggregators for bioinformatics packages from other resources, including BioPerl, Biopython, R and BioJava, which may get their own base domain names. Also we wish to create dynamic news feeds based on github commit updates, software releases, testing information etc., so as to create a one-stop resource for bioinformatics software. Also we want to push information to Twitter and Facebook.


 * Approach
 * We want to build up on the current biogems.info functionality with Ruby on as a site generator and HAML/SASS template handler. In the browser we want to use Coffescript to for interactive features to the site, as well as fetching live commit information from github, for example.


 * Difficulty and needed skills
 * Affinity for web design, accessibility of information, web programming in Ruby. Javascript/Coffeescript.


 * The project requires
 * Some Ruby experience, interest in web design and social networking


 * Mentors
 * Members of the BioRuby panel

A parallelized framework for processing large numbers of VCF files using Scala Actors and JRuby

 * Rationale


 * VCF files are the typical output of whole genome resequencing projects (http://www.1000genomes.org/node/101). They hold the information on all the mutations and variations (SNPs and InDels) that are found by comparing the outputs of a NGS platform with a reference genome. These files are not incredibly large (a typical uncompressed VCF file is few gigabytes) but they are full with information on millions of positions in the genome where mutations are found. Large resequencing projects can produce hundreds or thousands of these files, one for each sample sequenced.
 * Existing tools (such as VCFTools or BCFTools) let you manipulate, convert and access the information stored into VCF files but are limited in functionalities and speed when there is the need to work with many files together and compare the variations found for example in 100 samples to identify common mutations sites among sub-groups of samples, or to extract for instance all the mutations that are present in 50 samples but are not present in the other 50 and so forth.


 * Approach
 * The project will develop a framework (a single utility or a set of utilities) to address the issues described in the rationale and to allow users to manipulate and compare hundreds of VCF files. Given the high number of information that will need to be processed, the JVM and the Scala language will be the preferred choice, using the Akka actors library to develop a high performance, highly parallelized framework to process VCF files. A database would be required to support the information processing and mining, traditional RDBMS, noSQL or semantic databases are all valid choices. The decision on the best database engine to be used will be discussed between the student and the mentors and within the Bio projects community.
 * The JRuby language could then be used to create a nice interface around the framework to run the different tasks and to easily distribute it as a BioRuby gem (http://www.biogems.info).


 * Difficulty and needed skills
 * The project is mid / high difficulty, aimed at talented students. Previous knowledge of Scala or Ruby is not necessary but a background in advanced programming languages (like C++, Java) is essential to develop the project.


 * The project requires
 * Knowledge of advanced programming languages. Some experience and knowledge of databases and data mining will help managing the information of VCF files.


 * Mentors
 * Francesco Strozzi (author of bioruby-grid, bioruby-ngs etc.), Raoul J.P. Bonnal (bioruby-samtools, bioruby-ngs, biogems etc.)

Add more project ideas here
Use the template of the other project ideas. Make sure this is finalised before student submissions start.

2013

 * See above: pending approval by Google Summer of Code (in April 2013)

GSoC:HPC-GFF3 Write the world's fastest parallelized GFF3/GTF parser in D, for Ruby FFI

 * Rationale
 * GFF3/GTF parsers are used by genome browsers and next-gen sequencing tools. Current parsers are slow and use a lot of memory. A fast low-memory parser would be beneficial to many bio-medical projects


 * Approach
 * Based on existing implementation we can design a fast parser using the D programming language. D provides capabilities for hand-crafting high-performance parsers. If required, parallelization of records can be introduced by using Actors. D can compile libraries which can be bound to Ruby using a C-style interface. This means the GFF3/GTF parser can be used from Ruby. The design will focus on iterating records and feeding them back to the Ruby environment. The library will also be useful for Python, Perl and the JVM.


 * Difficulty and needed skills
 * This is a challenging project. Advanced programming concepts, concurrency, foreign language bindings.


 * The project requires
 * An interest in high performance computing. Some affinity with coding in C and one or more interpreted languages


 * Mentors
 * Pjotr Prins (author of bio-gff3), Raoul Bonnal


 * Other interested parties
 * Naohisa Goto (author BioRuby's GFF3 parser), Brad Chapman (author Biopython's GFF3 parser) and Peter Cock (Biopython), Chris Fields (BioPerl).

GSoC:MAF Extend bio-alignment plug-in with Multiple Alignment Format -MAF- parser

 * Rationale
 * The multiple alignment format stores a series of multiple alignments in a format that is easy to parse and relatively easy to read. This format stores multiple alignments at the DNA level between entire genomes. Previously used formats are suitable for multiple alignments of single proteins or regions of DNA without rearrangements, but would require considerable extension to cope with genomic issues such as forward and reverse strand directions, multiple pieces to the alignment, and so forth.


 * Approach
 * Create a native ruby parser similar to BioPython API http://biopython.org/wiki/Multiple_Alignment_Format, because they have an interesting indexing system. Another approach consists in using FFI to bind native C libraries like http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ and http://compgen.bscb.cornell.edu/phast/index.php. A lazy parser is preferred oven an eager one.


 * Difficulty and needed skills
 * Medium, foreign language bindings


 * The project requires
 * Knowledge of Ruby and, in case, of C bindings


 * Mentors
 * Pjotr Prins, Raoul Bonnal

GSoC:Bam Robust and fast parallel BAM parser in D for binding against dynamic languages

 * Difficulty and needed skills
 * This is a challenging project. Advanced programming concepts, concurrency, foreign language bindings.


 * The project requires
 * An interest in high performance computing. Some affinity with coding in C and one or more interpreted languages


 * Mentors
 * Pjotr Prins, Francesco Strozzi, Raoul Bonnal

Update bio-images a plugin to represent bio-object with cool images

 * Rationale
 * After GSoC 2011 the main library used to plot the chats was discontinued and a new one was introduced. We want to update this Biogem -BioRuby plugin-. Most of the time, after a bioinformatics analysis, the resulting data needs to be re-processed into a graphical way since we, as human-beings, are more comfortable accessing results and data visually than browsing a huge table with interconnected information. Very often it is also difficult to extrapolate the real biological meaning from a raw datasets. The main idea of this proposal is to define and attach graphical functions to BioRuby objects and consequently to the results computed from a generic process or pipeline. With this solution, it would be possible to explore them more naturally but also to export and integrate the information into a web environment, for sharing the knowledge and the results. For example, different objects storing alignments results could share the same interface and display their data in a common way. The same is true also for other kind of objects or computational procedures.


 * Approach
 * Study the new library http://mbostock.github.com/d3/ and update the code already developed, then improve the number of charts and objects supported.


 * Difficulty and needed skills
 * Medium/Hard. The student will need to define a graphical API and integrate the new code with the existing BioRuby modules. High level coding skills will be required to create a clean API with a clear documentation.


 * The project requires
 * Very good knowledge of Ruby (1.9)
 * Basic concepts of graphics/visualization
 * Ruby on Rails basic knowledge


 * Mentors
 * Raoul J.P. Bonnal, ...

Testing framework for biogem plugins

 * Rationale
 * Biogems are Ruby gems that are created by independent authors, see the publication. Designing an automated testing frame work for different versions of Ruby is critical for the successful use of Ruby in bioinformatics. Gems should be tested on release, but also when tagged on github


 * Approach
 * We integrate facilities of http://biogems.info/, http://rubygems.org/ and http://github.com/ to automatically test gems that get released in the public domain. We can make use of open-bio's testing framework to test individual gems, or crowd sourcing setups, such as http://test.rubygems.org/ to test gems in different setups. Next we program the http://biogems.info/ website to show the test results in easy way.


 * Difficulty and needed skills
 * Ruby scripting, and affinity for web integration, some web programming including Ruby, HTML (HAML), CSS and Javascript (Coffee)


 * Mentors
 * Pjotr Prins, Raoul Bonnal, Peter Cock (confirm)

Adding social networking functionality to BioRuby.org

 * Rationale
 * BioRuby.org is a portal for getting appropriate information on Ruby-related software development to bioinformatics software developers. The current portal discourages both experienced and inexperienced software developers from digging deeper, and finding solutions to typical bioinformatics problems. We are looking at ways to motivate new developers, students and teachers to dive into (Bio)Ruby. This implies building out the community with functionality from twitter feeds all the way to biogem github updates.


 * Approach
 * We want to restyle the portal so it becomes an interactive environment, encourages people to participate and put information in, and information gets easier to find. The restyling is about web design, and programming an interactive website in Ruby, using Ruby on Rails and other tools, such as markdown, haml, sass, staticmatic, etc. Also the idea is to use existing webservices, such has github gists, rubydoc.info, e.g. example, and embed them into the site - rather than recreating all these services from scratch. We would like to create a collection of code snippets and documentation that is easy to navigate and add to. It should be even less effort than maintaining a Wiki. Also code snippets should be able to run online - and prove they are correct. The total design should also be useful for other Bio* projectcs, such as BioPerl. We are currently defining the features we want from such a web presence, see [| features]. It is even possible to get a scientific paper out of this work.


 * Difficulty and needed skills
 * Affinity for web design, accessibility of information, web programming in Ruby. Javascript/Coffeescript.


 * The project requires
 * Some Ruby experience, interest in web design and social networking


 * Mentors
 * The BioRuby panel: Raoul Bonnal, Pjotr Prins, Francesco Strozzi, Naohisa Goto, Toshiaki Katayama (confirm)

Update to the Ruby Ensembl API

 * Rationale
 * The Ruby Ensembl API has been published on 2011 (http://bioinformatics.oxfordjournals.org/content/early/2011/01/28/bioinformatics.btr050) and allows users to programmatically access the Ensembl Database with Ruby. The API was modeled on the Ensembl Perl API to give users the same methods they already know and are familiar with, but it provides also a slightly different approach with easier and powerful methods to access the different Ensembl databases and retrieve biological data. From the developers side, the project is based mainly on ActiveRecord classes that map tables in the MySQL databases, plus other libraries that define high order methods to combine different data from different tables and provide connection to the Ensembl databases. So far the Ruby API covers the Ensembl Core, Ensembl Variation and Ensembl Genomes databases, which are updated every 2-3 months by the EBI teams. The API uses Ruby metaprogramming to adapt to a database schema and the code needs to be updated only if significant changes occur to the databases. We want to push the project further and define a library that can adapt itself at every new Ensembl release, although minor or major changes can occur to the schemes.


 * Approach
 * As part of the Ruby Ensembl API, an utility could be created that can be run periodically at every change in the Ensembl databases schemes, to generate the necessary ActiveRecord classes and define relations among classes using the fields and foreign keys present into the MySQL tables. A testing suite could also be generated according to the new classes and methods defined.


 * Difficulty and needed skills
 * Medium/Hard. The student need to know Ruby quite well and he/she will make a strong use of advanced Ruby metaprogramming. He/she will also need to understand how the Ensembl databases work and how the biological data are organized.


 * The project requires
 * Very good knowledge of Ruby (1.9)
 * Good knowledge of database schemes, MySQL in particular.
 * Good knowledge of Ruby metaprogramming.
 * Basic knowledge of Ensembl website and/or Ensembl databases, even if not mandatory, will help.


 * Mentors
 * Francesco Strozzi


 * Current BioGem available at
 * https://github.com/fstrozzi/bioruby-ensembl

Represent bio-objects and related information with images (ACCEPTED)

 * Rationale
 * Most of the time, after a bioinformatics analysis, the resulting data needs to be re-processed into a graphical way since we, as human-beings, are more comfortable accessing results and data visually than browsing a huge table with interconnected information. Very often it is also difficult to extrapolate the real biological meaning from a raw datasets. The main idea of this proposal is to define and attach graphical functions to BioRuby objects and consequently to the results computed from a generic process or pipeline. With this solution, it would be possible to explore them more naturally but also to export and integrate the information into a web environment, for sharing the knowledge and the results. For example, different objects storing alignments results could share the same interface and display their data in a common way. The same is true also for other kind of objects or computational procedures.


 * Approach
 * The student and the mentor will define together a minimum set of features that need to be shared by the BioRuby objects and that could be visualized. Then the student will create a library/module to implement these graphical features within the BioRuby project. He/she will gain experience with Rubyvis as the graphical API and with Ruby on Rails for web visualization.


 * Difficulty and needed skills
 * Medium/Hard. The student will need to define a graphical API and integrate the new code with the existing BioRuby modules. High level coding skills will be required to create a clean API with a clear documentation.


 * The project requires:
 * Very good knowledge of Ruby (1.9) and pattern design
 * Basic concepts of graphics/visualization
 * Ruby on Rails basic knowledge


 * Mentors
 * Raoul J.P. Bonnal, Christian Zmasek, Claudio Bustos (confirm)

Support Next Generation Sequencing (NGS) in BioRuby (proposed)

 * Rationale
 * The processing and analyzing of NGS data is challenging for a variety of reasons, in particular due to the fact that the data-sets are usually very large and contain a vast amount of information and a high number of unknown data. Furthermore there are many different approaches to perform NGS analyses and several software tools need to be integrated to produce reliable results. Since this topic is so important for the BioRuby community we started a sub-project bioruby-ngs for analyzing NGS data. The project is in an early stage of development but notable results have been quickly gained. Many topics need to be still addressed, in particular: * data and results reporting


 * workflow management
 * DSL for describing experimental designs
 * YALIMS (Yet Another LIMS), a simple web based Lims for raw datasets processing, with reporting and monitoring


 * Approach
 * Due to the open nature of the project the student will choose which feature he/she wants to develop and to focus on. The student will learn basic concept of NGS data analysis and will work tightly with a mentor to produce a working library that will be integrated into the BioRuby NGS project.


 * Difficulty and needed skills
 * Medium to Hard depending on the topic selected.


 * The project requires:
 * Ruby
 * Bash programming and knowledge of the Linux environment
 * Ruby on Rails 3.x


 * Mentors
 * Raoul J.P. Bonnal, Francesco Strozzi


 * Project overview and updates
 * [1]


 * Source code
 * https://github.com/helios/bioruby-ngs

BioRuby Wrapper for Command line application (proposed)

 * Rationale
 * The main reason for this project is the need to support different stand-alone applications critical for Next Generation Sequences analyses. Direct binding to existing C/C++ source code or rewriting all the applications is impractical and a waste of resources. A quick solution is to use stand-alone applications directly, integrating them into the BioRuby API. Some work has been already done in the BioRuby NGS project [| with this wrapper] but a better support for demanding I/O processes is required. Following this design pattern will be possible to improve also the support for other bioinformatics suites, like EMBOSS, outdated in BioRuby at the time of this proposal.


 * Approach
 * The student will familiarize with advanced meta-programming concepts in Ruby and will contribute to the definition of a DSL for this wrapping library. He/she will build also a parser to automatically define additional wrappers for the EMBOSS suites starting from the ACD configuration files.


 * Difficulty and needed skills
 * Medium. Good Ruby knowledge and experience with meta-programming are required to achieve the goals.


 * The project requires:
 * Ruby 1.9
 * Ruby Metaprogramming


 * Mentors
 * Raoul J.P. Bonnal, Francesco Strozzi


 * Source code
 * https://github.com/helios/bioruby-ngs, wrapper branch

Modular annotation knowledge base for BioRuby (proposed)

 * Rationale
 * Handling data sets coming from platforms for gene expression analysis or real time PCR requires to access the corresponding gene annotations several times during the measurements. This kind of information is normally stored into remote databases that provide the required knowledge and data. Problems arise when the available databases do not support a specific version of the data of interest or when huge queries need to be submitted. A BioRuby knowledge base, designed to be modular and expandable through time, could solve these problems. A good compromise between performances and portability could be achieved using embedded databases and accessing the data through a clean API.


 * Approach
 * The student and the mentor will explore which platforms should be supported by their popularity. Then the student will recover the essential annotation and will design a simple database schema to support all the relevant non-redundant information. The schema will be flexible enough to allow interconnecting the dataset with external databases or resources for subsequent analyses. After this phase of discovery and design, the student will build the database using SQLite and will write a Ruby library to access the data using ORM ActiveRecord


 * Difficulty and needed skills
 * Medium. The student will need to define the core data to be included into the database and how this information will be organized and accessed by the end-user. The Ruby library will be created using the powerful ActiveRecord paradigms, but good coding skills will be required to design an efficient API with a clear documentation.


 * The project requires:
 * Minimal SQL dialect
 * Good knowledge of Ruby
 * Experience in querying biological databases
 * Experience with annotation data


 * Mentors
 * Raoul J.P. Bonnal, Francesco Strozzi

Ruby 1.9.2 support of BioRuby (accepted)

 * Rationale
 * New stable Ruby version 1.9.2 is now under development, and will soon be released. It have many improvements and some incompatible changes. The goal of the project is to run almost all functions of BioRuby correctly in both Ruby 1.8.x and Ruby 1.9.2.


 * Approach
 * First, implement unit tests to guarantee no behavior changes. Next, modify existing codes.


 * Difficulty and needed skills
 * Medium.


 * The project requires:
 * Ruby programming skill
 * bioinformatics skill, or motivation to learn bioinformatics

In addition, the student should also have interest in the differences between Ruby 1.8 and 1.9.


 * Mentors
 * Naohisa Goto


 * Student
 * Kazuhiro Hayashi


 * Project overview and updates
 * project blog for the Ruby 1.9.2 support of BioRuby


 * Source code
 * http://github.com/GSoC2010KH/bioruby

Implementation of algorithm to infer gene duplications in BioRuby (accepted)

 * Rationale
 * Gene duplications are an important concept in biomedical research. They are of particular importance in the study of molecular evolution, since they are believed to be a major driver in the evolution of new protein functions. Furthermore, the inference of gene duplications is almost always necessary for accurate sequence function prediction, as sequences related by gene duplications (paralogs) are more likely to exhibit differences on a functional level than sequences related by speciations (orthologs). Gene duplications can be inferred by calculating an evolutionary tree of the molecular sequences being analyzed, and then comparing this gene tree with a species tree (the 'tree of life'). For this purpose, we developed a simple and fast algorithm, named SDI, for speciation duplication inference (reference: Zmasek and Eddy, 2001, "A simple algorithm to infer gene duplication and speciation events on a gene tree", Bioinformatics, 17, 821-828). Implementing this algorithm in the increasingly popular Ruby programing language, as part of the BioRuby open source bioinformatics project would give a large number of biologist and software developers immediate access to a useful tool, particularly in light of the ever increasing number of sequenced genomes and the associated increase in comparative functional genomics studies.


 * Approach
 * Development of unit tests followed by the implementation of the algorithm and necessary data structures. Since BioRuby supports phyloXML, the basic infrastructures needed to implement the SDI algorithm are already present (such as data structures to store species and gene information, input and output of phylogenetic trees), making the implementation relatively straightforward. Dependent on student interest and aptitude for computer science and algorithm development, this project might also entail extending the algorithm itself. Currently, it is only defined for binary trees. A very useful extension would be to allow non-binary species trees, and, possibly, non-binary gene trees. Relevant references on the theories behind this proposal can be found here.


 * Difficulty and needed skills
 * Medium. The project requires Ruby programming skills and some experience with algorithms. Knowledge about evolutionary biology is advantageous but not mandatory. The (optional) extension of the algorithm for non-binary trees requires a solid background in computer science.


 * Mentors
 * Christian Zmasek, Diana Jaunzeikare


 * Student
 * Sara Rayburn


 * Project overview, timeline, and updates
 * Implementing SDI Project Updates


 * Source code
 * http://github.com/srayburn/bioruby/

Related Projects
As part of NESCent's Phyloinformatics GSoC


 * Develop an API for NeXML I/O, and, RDF triples for BioRuby

This is an application for the Evolution and the semantic web in Ruby: NeXML I/O for BioRuby project idea.

Abstract: Add NeXML parsing and serializing support, and an RDF triples API to BioRuby.

Student: Anurag Priyam

Mentor(s): Rutger Vos (primary), Jan Aerts

Project Homepage: Develop an API for NeXML and RDF triples for BioRuby

Project blog: My Weblog( phylosoc label )

Source Code: Github

Past Mentor List
BioRuby developers who volunteered to act as a mentor to a GSoC student.


 * Naohisa Goto - 1 student
 * Christian Zmasek - 1 student

Implementing phyloXML support in BioRuby (accepted)
As part of NESCent's Phyloinformatics GSoC


 * Diana Jaunzeikare *  Project page: PhyloSoC:PhyloXML_support_in_BioRuby
 * The code is included in the BioRuby 1.4.0 release.