Project

blingfire

0.07
The project is in a healthy, maintained state
High speed text tokenization for Ruby
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme

Bling Fire Ruby

Bling Fire - high speed text tokenization - for Ruby

Build Status

Installation

Add this line to your application’s Gemfile:

gem "blingfire"

Getting Started

Create a model

model = BlingFire::Model.new

Tokenize words

model.text_to_words(text)

Tokenize sentences

model.text_to_sentences(text)

Get offsets for words

words, start_offsets, end_offsets = model.text_to_words_with_offsets(text)

Get offsets for sentences

sentences, start_offsets, end_offsets = model.text_to_sentences_with_offsets(text)

Pre-trained Models

Bling Fire comes with a default model that follows the tokenization logic of NLTK with a few changes. You can also download other models:

Load a model

model = BlingFire.load_model("bert_base_tok.bin")

Convert text to ids

model.text_to_ids(text)

Get offsets for ids

ids, start_offsets, end_offsets = model.text_to_ids_with_offsets(text)

Disable prefix space

model = BlingFire.load_model("roberta.bin", prefix: false)

Ids to Text

Load a model

model = BlingFire.load_model("bert_base_tok.i2w")

Convert ids to text

model.ids_to_text(ids)

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/blingfire-ruby.git
cd blingfire-ruby
bundle install
bundle exec rake vendor:all download:models
bundle exec rake test