Project

splitta

0.0
The project is in a healthy, maintained state
Implementation of Splitta in Ruby. See https://code.google.com/archive/p/splitta/
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
 Dependencies

Development

 Project Readme

Ruby Splitta

Status

Gem Version Build Status Code Climate Test Coverage MIT License

Description

Splitta Includes proper tokenization and models for very high accuracy sentence boundary detection (English only for now). The models are trained from Wall Street Journal news combined with the Brown Corpus which is intended to be widely representative of written English. Error rates on test news data are near 0.25%.

Installation

gem install splitta

Requirements

  • Ruby 2.5.1 or higher

Usage

require 'splitta'

Splitta.sentences("Some text goes here.")

License

MIT. See the LICENSE file.

References

Dan Gillick, “Sentence Boundary Detection and the Problem with the U.S.” at NAACL 2009, http://dgillick.com/resource/sbd_naacl_2009.pdf