0.0
No commit activity in last 3 years
No release in over 3 years
Collect template informations from MediaWiki markup
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.3
>= 0
>= 0
>= 0
 Project Readme

Wptemplates

Build Status Gem Version

Gem for collecting template informations from mediawiki markup.

It will help you to extract useful machine-readable data from wikipedia articles, since there ist a lot of useful stuff encoded as templates.

Currently only templates and links are parsed, all other markup is ignored.

Installation

Add this line to your application's Gemfile:

gem 'wptemplates'

And then execute:

$ bundle

Or install it yourself as:

$ gem install wptemplates

Usage

To parse a piece of markup simply call:

ast = Wptemplates.parse("{{foo | bar | x = 3 }} baz [[bam (2003)|]]y")

You will get an instance of Wptemplates::Soup which is an array of Wptemplates::Template, Wptemplates::Link and Wptemplates::Text. You can explore the AST with these methods:

ast.templates.is_a?(Array) && ast.templates.length # => 1
ast.text # => " baz bamy"

To find template data:

ast[0].name # => :foo
ast[0].params[0].text # => " bar "
ast[0].params[:x].text # => "3"
ast.all_templates_of(:foo).map{|t| t.params[:x].text} # => ["3"]
ast.navigate(:foo, :x) {|p|p.text} # => "3"
ast.navigate(:foo, :y) {|p|p.text} # => nil

You can access the links via:

ast.links.length # => 1
ast.links[0].text # => "bamy"
ast.all_links.map{|l| l.link} # => ["Bam (2003)"]

Developing

Here's some useful info if you want to improve/customize this gem.

Getting Started

Checkout the project, run bundle and then rake to see if the tests pass. Run rake -T to see the rake tasks.

Markup

MediaWiki markup is not trivial to parse and there might always be compatibility issues. There's a useful help page about templates and a markup spec. For links there is a page about links and about the pipe trick. Also, there is a page with link's BNF.

Known Issues

  • If you have images in your templates the pipes cause a new parameter
  • Namespaced links are not recognized
  • Templates in links are not recognized
  • Links contents are not htmldecoded
  • nowiki, pre and math blocks might cause problems

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request