Project

chawan

0.0
No commit activity in last 3 years
No release in over 3 years
A cup for chasen that provides an easy to use for extracting Japanese
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme
chawan
======

  A cup for chasen that provides an easy to use for extracting Japanese


Methods
=======

  * Chawan.parse(text)
    parse the given text by analyzer, where default analyzer is :mecab

  * Chawan.analyzer(xxx)      (same as Chawan[xxx], Chawan.xxx)
    specify analyzer


Class
=====

  * Chawan::Nodes (Chawan.parse returns a Chawan::Nodes)
      #noun    : scope category with noun
      #verb    : scope category with verb
      #grep    : scope category with given pattern
      #compact : mix the category-consecutive nodes

  * Chawan::Node (Chawan::Nodes has many Chawan::Node(s))
      #category   : part of speech
      #word       : text
      #attributes : keys and vals hash


Example
=======

  text = '登録された利用者'

  # 'parse' returns a Chawan::Nodes
  Chawan.parse(text)
  => [<名詞: '登録'>, <動詞: 'さ'>, <動詞: 'れ'>, <助動詞: 'た'>, <名詞: '利用'>, <名詞: '者'>]

  # Chawan::Nodes is enumerable
  Chawan.parse(text).select{|node| node.category == '名詞'}
  => [<名詞: '登録'>, <名詞: '利用'>, <名詞: '者'>]

  # gateway interface: noun
  Chawan.parse(text).noun
  => [<名詞: '登録'>, <名詞: '利用'>, <名詞: '者'>]

  # gateway interface: verb
  Chawan.parse(text).verb
  => [<動詞: 'さ'>, <動詞: 'れ'>, <助動詞: 'た'>]

  # gateway interface: grep
  Chawan.parse(text).grep(/動詞/)
  => [<動詞: 'さ'>, <動詞: 'れ'>, <助動詞: 'た'>]
  Chawan.parse(text).grep('動詞')
  => [<動詞: 'さ'>, <動詞: 'れ'>]

  # gateway interface: compact
  Chawan.parse(text).compact
  => [<名詞: '登録'>, <動詞: 'され'>, <助動詞: 'た'>, <名詞: '利用者'>]
  Chawan.parse(text).compact(/動詞/)
  => [<名詞: '登録'>, <動詞: 'された'>, <名詞: '利用'>, <名詞: '者'>]

  # gateway interface is chainable
  Chawan.parse(text).noun.verb
  => []

  # chainable is fun!
  Chawan.parse(text).noun
  => [<名詞: '登録'>, <名詞: '利用'>, <名詞: '者'>]
  Chawan.parse(text).compact.noun
  => [<名詞: '登録'>, <名詞: '利用者'>]
  Chawan.parse(text).noun.compact
  => [<名詞: '登録利用者'>]

  
Analyzer
========

  Parser engine is defined as 'analyzer'.
  Available analyzers are:

    * mecab : (default)
    * chasen
    
  Chawan[:mecab].parse('test')
  => [<名詞: 'test'>]

  # same as
  #   Chawan.mecab.parse('test')
  #   Chawan.analyzer(:mecab).parse('test')
  #   Chawan.parse('test')  # default analyzer is :mecab

  Chawan[:chasen].parse('test')
  => [<記号: 't'>, <記号: 'e'>, <記号: 's'>, <記号: 't'>]


Required
========

  * UTF-8
  * 'mecab' unix command (and its path)


Todo
====

  * use open3 rather than backquote for executing unix commands


Author
======

  maiha@wota.jp