Project

excite

0.01
Repository is archived
No commit activity in last 3 years
No release in over 3 years
Parse citations
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

>= 0
>= 0
>= 0

Runtime

 Project Readme

Excite

Provides a simple Ruby API for parsing citations from plain text strings or HTML.

Usage

  require 'excite'

  Excite.parse_string("Wilcox, Rhonda V. 1991. Shifting roles and synthetic women in Star trek: The next generation. Studies in Popular Culture 13 (June): 53-65.")

  Excite.parse_html("<span>Devine, PG, & Sherman, SJ</span><span>(1992)</span><strong>Intuitive versus rational judgment and the role of stereotyping in the human condition: Kirk or Spock?</strong><em>Psychological Inquiry</em><span>3(2), 153-159</span>")

History and Credits

Derived from FreeCite, minus Rails and all UI elements. The most up-to-date fork of FreeCite of which I am aware is rsinger's. FreeCite in turn is inspired by ParsCit.

The main changes are:

  • No UI, just a gem;
  • New model for parsing HTML;
  • Tokenization and part-of-speech features from EngTagger.

Credit is due to the authors of all the linked projects, as well as Laura Durkay who marked up the HTML training data.

Install required packages

From source

wget http://crfpp.googlecode.com/files/CRF%2B%2B-0.57.tar.gz
tar xvzf CRF++-0.57.tar.gz
cd CRF++-0.57
./configure 
make
sudo make install

On Ubuntu

sudo apt-add-repository 'deb http://cl.naist.jp/~eric-n/ubuntu-nlp oneiric all'
sudo apt-get update
sudo apt-get install libcrf++

On OS X with Homebrew

brew install crf++