Project

TJNGram

0.01
No commit activity in last 3 years
No release in over 3 years
It's common to see Chinese, Jananse and Korean articles contain some English, but it's not common to see an n-gram library which can parse this sort of articles. TJNGram was made for solving this problem.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme

TJNGram

It's common to see Chinese, Jananse and Korean articles contain some English, but it's not common to see an n-gram library which can parse this sort of articles. TJNGram was made for solving this problem.

Install

gem install tjngram

Usage

require 'tjngram'

text = <<eos
這是一個範例。
This is an example.
これは例です。

這裡有一個蘋果。
There is an apple.
これはリンゴです。
eos

puts text, "=========="

TJNGram.process(2, text) #=> {"一個"=>2, "これ"=>2, "is an"=>2, ...}

Note

If your file is utf-8 encoded, please run ruby with the following options:

ruby -Ku example.rb

It's strongly recommand you make your all script files utf-8 encoded.