Project

bio-velvet

0.0
No commit activity in last 3 years
No release in over 3 years
Parser to work with some file formats used in the velvet DNA assembler
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.4
~> 1.0
~> 2.0
~> 0.9
~> 4.1
~> 2.8

Runtime

~> 0.3
~> 0.4
>= 1.3.3, ~> 1.3
~> 2.6
 Project Readme

bio-velvet

Build Status

bio-velvet is a biogem for interacting with the velvet sequence assembler. It includes both a wrapper for the velvet executable, as well as a a parser for the 'LastGraph' format files that velvet creates. This gives access to the underlying assembly graph created by velvet.

Installation

To install bio-velvet and its rubygem dependencies:

gem install bio-velvet

Usage

To run velvet with a kmer length of 87 on a set of single ended reads in /path/to/reads.fa:

require 'bio-velvet'

velvet_result = Bio::Velvet::Runner.new.velvet(87, '-short /path/to/reads.fa') #=> Bio::Velvet::Result object

contigs_file = velvet_result.contigs_path #=> path to contigs file as a String
lastgraph_file = velvet_result.last_graph_path #=> path to last graph file as a String

Bio::Velvet::Runner.new.binary_version #=> e.g. "1.2.08"

By default, the velvet method passes no parameters to velvetg other than the velvet directory created by velveth. This directory is a temporary directory by default, but this can also be set. For instance, to run velvet using with a -cov_cutoff parameter in the velvet_dir directory:

velvet_result = Bio::Velvet::Runner.new.velvet(87,
  '-short /path/to/reads.fa',
  '-cov_cutoff 3.5', 
  :output_assembly_path => 'velvet_dir')

The graph file can be parsed from a velvet_result:

graph = velvet_result.last_graph #=> Bio::Velvet::Graph object

In my experience (mostly on complex metagenomes), the graph object itself does not take as much RAM as initially expected. Most of the hard work has already been done by velvet itself, particularly if the -cov_cutoff has been set. However parsing in the graph can take many minutes or even hours if the LastGraph file is big (>500MB). The slowest part of parsing is parsing in the positions of reads i.e. using the -read_trkg yes velvet option. To speed up that process one can use e.g.

velvet_result.last_graph(:interesting_read_ids => Set.new([1,2,3]))

To only parse read in the positions of the first 3 reads.

With a parsed graph (a Bio::Velvet::Graph object) you can interact with the graph e.g.

graph.kmer_length #=> 87
graph.nodes #=> Bio::Velvet::Graph::NodeArray object
graph.nodes[3] #=> Bio::Velvet::Graph::Node object with node ID 3
graph.get_arcs_by_node_id(1, 3) #=> an array of arcs between nodes 1 and 3 (Bio::Velvet::Graph::Arc objects)
graph.nodes[5].noded_reads #=> array of Bio::Velvet::Graph::NodedRead objects, for read tracking

There is much more that can be done to interact with the graph object and its components - see the rubydoc.

Parsers for Sequences and CnyUnifiedSeq.names files

With default parameters velvet generates a Seqeunces file, that includes read ID information and the sequences themselves.

seqs = Bio::Velvet::Sequences.parse_from_file(File.join velvet_result.result_directory, 'Sequences')
seqs[1] => 'AAAATTGTCAGACTAGCTATCAGCATATCAGCGCGCATCTCAGACGAGCACTATC'

If the -create_binary flag is set when running velveth, a names file is generated that encodes the read names and IDs.

entries = Bio::Velvet::CnyUnifiedSeqNamesFile.extract_entries(
  File.join(velvet_result.result_directory, 'CnyUnifiedSeq.names'),
  ['read1','read2']
  ) #=> Hash of read name to Array of CnyUnifiedSeqNamesFileEntry objects
entries['read1'] #=> Array of CnyUnifiedSeqNamesFileEntry objects
entries['read1'][0].read_id #=> 1 (i.e. '1'.to_i)

When speed is required, grep can come to the rescue (at the cost of some portability)

entries = Bio::Velvet::CnyUnifiedSeqNamesFile.extract_entries_using_grep_hack(
  File.join(velvet_result.result_directory, 'CnyUnifiedSeq.names'),
  ['read1','read2']
  ) #=> same returned object as above

The sequences themselves are stored in a separate file when -create_binary is used - an interface for this is included in the bio-velvet_underground biogem.

Project home page

Information on the source tree, documentation, examples, issues and how to contribute, see

http://github.com/wwood/bioruby-velvet

The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.

Cite

This code is currently unpublished.

Biogems.info

This Biogem is listed at biogems.info

Copyright

Copyright (c) 2013 Ben J Woodcroft. See LICENSE.txt for further details.