Project

wikipedia

0.01
No commit activity in last 3 years
No release in over 3 years
tool for extracting plain text from wikipedia articles
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies
 Project Readme

wikipedia

tool for extracting plain text from wikipedia articles

Installing:

a gem is available, so fire up your terminal:

$ gem install wikipedia

Usage:

it's easy:

irb(main):001:0* require 'wikipedia'
irb(main):002:0>
irb(main):003:0* connor = Wikipedia::article 'John Connor'
irb(main):004:0> connor.first     # just the first paragraph
"John Connor is a fictional character and the main protagonist of the Terminator franchise.
Created by writer and director James Cameron, the character is first referred to in the 1984 film The Terminator 
and first appears, portrayed by teenage actor Edward Furlong, in its 1991 sequel Terminator 2: Judgment Day.
The character is subsequently portrayed by 23-year-old Nick Stahl in the 2003 film Terminator 3: Rise of the Machines
and by 19-year-old Thomas Dekker in the 2007 television series Terminator: The Sarah Connor Chronicles.
English actor Christian Bale portrays Connor in the film series' fourth installment, Terminator Salvation."

There's a simple method for checking term's ambiguity, an array of those other terms will be provided in the future.

A good example is 'apple' which may refer to the company, to the fruit, etc.

irb(main):001:0> require 'wikipedia'
irb(main):002:0> apple = Wikipedia::article 'apple'
irb(main):003:0> apple.ambiguous?
=> true

TODO

  • Integrate it with the [Opensearch API] (http://www.mediawiki.org/wiki/API%3aOpensearch).
  • Provide a method for classifying text based on context (using data from Wikipedia's disambiguation pages).
  • Switch to Nokogiri or provide support for both Nokogiri and Hpricot?

Disclaimer

[Hpricot] (https://github.com/whymirror/hpricot) was used as a tribute to [whytheluckystiff] (http://en.wikipedia.org/wiki/Why_the_lucky_stiff).

License

MIT