ParseFasta
So you want to parse a fasta file...
Installation
Add this line to your application's Gemfile:
gem 'parse_fasta'And then execute:
$ bundle
Or install it yourself as:
$ gem install parse_fasta
JRuby
ParseFasta doesn't work with JRuby for now D:
Overview
Provides nice, programmatic access to fasta and fastq files. It's faster and more lightweight than BioRuby. And more fun!
It takes care of a lot of whacky edge cases like parsing multi-blob gzipped files, and being strict on formatting by default.
Documentation
Checkout parse_fasta docs for the full api documentation.
Usage
Here are some examples of using ParseFasta. Don't forget to require "parse_fasta" at the top of your program!
Print header and length of each record.
ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
puts [rec.header, rec.seq.length].join "\t"
endYou can parse fastQ files in exatcly the same way.
ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
printf "Header: %s, Sequence: %s, Description: %s, Quality: %s\n",
rec.header,
rec.seq,
rec.desc,
rec.qual
endThe Record#desc and Record#qual will be nil if the file you are parsing is a fastA file.
ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
if rec.qual
# it's a fastQ record
else
# it's a fastA record
end
endYou can also check this with Record#fastq?
ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
if rec.fastq?
# it's a fastQ record
else
# it's a fastA record
end
endAnd there is a nice #to_s method, that does what it should whether the record is fastA or fastQ like. Check out the docs for info on the fancy #to_fasta and #to_fastq methods!
ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
puts rec.to_s
endBut of course, since it is a #to_s override...you don't even have to call it directly!
ParseFasta::SeqFile.open(ARGV[0]).each_record do |rec|
puts rec
endSometimes your fasta file might have record separators (>) withen the "sequence". For example, CD-HIT's .clstr files have headers within what would be the sequence part of the record. ParseFasta is really strict about formatting and will raise an error when trying to read these types of files. If you would like to parse them, use the check_fasta_seq: false flag like so:
ParseFasta::SeqFile.open(ARGV[0], check_fasta_seq: false).each_record do |rec|
puts rec
end