Project

micromicro

0.0
There's a lot of open issues
Extract microformats2-encoded data from HTML documents.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
 Dependencies

Runtime

 Project Readme

MicroMicro

A Ruby gem for extracting microformats2-encoded data from HTML documents.

Gem Downloads Build Maintainability Coverage

Key Features

Note: MicroMicro does not parse Classic Microformats (referred to in the parsing specification as "backcompat root classes" and "backcompat properties" and in vocabulary specifications in the "Parser Compatibility" sections [e.g. h-entry]). To parse documents marked up with Classic Microformats, consider using the official microformats-ruby parser.

¹ …with some exceptions until this pull request is merged.

Getting Started

Before installing and using MicroMicro, you'll want to have Ruby 2.5 (or newer) installed. It's recommended that you use a Ruby version managment tool like rbenv, chruby, or rvm.

MicroMicro is developed using Ruby 2.5.9 and is additionally tested against Ruby 2.6, 2.7, and 3.0 using CircleCI.

Installation

If you're using Bundler, add MicroMicro to your project's Gemfile:

source 'https://rubygems.org'

gem 'micromicro'

…and hop over to your command prompt and run…

$ bundle install

Usage

Basic Usage

MicroMicro's parse method accepts two arguments: a String of markup and a String representing the URL associated with that markup.

The markup (typically HTML) can be retrieved from the Web using a library of your choosing or provided inline as a simple String (e.g. <div class="h-card">Jason Garber</div>) The URL provided is used to resolve relative URLs in accordance with the document's language rules.

An example using a simple String of HTML as input:

require 'micromicro'

doc = MicroMicro.parse('<div class="h-card">Jason Garber</div>', 'https://sixtwothree.org')
#=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: [#<MicroMicro::Item types: ["h-card"], properties: 1, children: 0>]>, relationships: #<MicroMicro::Collections::RelationshipsCollection count: 0, members: []>>

doc.to_h
#=> { :items => [{ :type => ["h-card"], :properties => { :name => ["Jason Garber"] } }], :rels => {}, :"rel-urls" => {} }

The Hash produced by calling doc.to_h may be converted to JSON (e.g. doc.to_h.to_json) for storage, additional manipulation, or use with other tools.

Another example pulling the source HTML from Tantek's website:

require 'net/http'
require 'micromicro'

url = 'https://tantek.com'
rsp = Net::HTTP.get(URI.parse(url))

doc = MicroMicro.parse(rsp, url)
#=> #<MicroMicro::Document items: #<MicroMicro::Collections::ItemsCollection count: 1, members: […]>, relationships: #<MicroMicro::Collections::RelationshipsCollection count: 31, members: […]>>

doc.to_h
#=> { :items => [{ :type => ["h-card"], :properties => {…}, :children => […]}], :rels => {…}, :'rel-urls' => {…} }

Advanced Usage

Building on the example above, a MicroMicro-parsed document is navigable and manipulable using a familiar Enumerable-esque interface.

Items

doc.items.first
#=> #<MicroMicro::Item types: ["h-card"], properties: 42, children: 6>

# 🆕 in v1.0.0
doc.items.types
#=> ["h-card"]

doc.items.first.children
#=> #<MicroMicro::Collections::ItemsCollection count: 6, members: […]>

Properties

doc.items.first.properties
#=> #<MicroMicro::Collections::PropertiesCollection count: 42, members: […]>

# 🆕 in v1.0.0
doc.items.first.plain_text_properties
#=> #<MicroMicro::Collections::PropertiesCollection count: 34, members: […]>

# 🆕 in v1.0.0
doc.items.first.url_properties
#=> #<MicroMicro::Collections::PropertiesCollection count: 11, members: […]>

# 🆕 in v1.0.0
doc.items.first.properties.names
#=> ["category", "name", "note", "org", "photo", "pronoun", "pronouns", "role", "uid", "url"]

# 🆕 in v1.0.0
doc.items.first.properties.values
#=> [{:value=>"https://tantek.com/photo.jpg", :alt=>""}, "https://tantek.com/", "Tantek Çelik", "Inventor, writer, teacher, runner, coder, more.", "Inventor", "writer", "teacher", "runner", "coder", …]

doc.items.first.properties[7]
#=> #<MicroMicro::Property name: "category", prefix: "p", value: "teacher">

doc.items.first.properties.take(5).map { |property| [property.name, property.value] }
#=> [["photo", { :value => "https://tantek.com/photo.jpg", :alt => "" }], ["url", "https://tantek.com/"], ["uid", "https://tantek.com/"], ["name", "Tantek Çelik"], ["role", "Inventor, writer, teacher, runner, coder, more."]]

Relationships

doc.relationships.first
#=> #<MicroMicro::Relationship href: "https://tantek.com/", rels: ["canonical"]>

# 🆕 in v1.0.0
doc.relationships.rels
#=> ["alternate", "apple-touch-icon-precomposed", "author", "authorization_endpoint", "bookmark", "canonical", "hub", "icon", "me", "microsub", …]

# 🆕 in v1.0.0
doc.relationships.urls
#=> ["http://dribbble.com/tantek/", "http://last.fm/user/tantekc", "https://aperture.p3k.io/microsub/277", "https://en.wikipedia.org/wiki/User:Tantek", "https://github.com/tantek", "https://indieauth.com/auth", "https://indieauth.com/openid", "https://micro.blog/t", "https://pubsubhubbub.superfeedr.com/", "https://tantek.com/", …]

doc.relationships.find { |relationship| relationship.rels.include?('webmention') }
# => #<MicroMicro::Relationship href: "https://webmention.io/tantek.com/webmention", rels: ["webmention"]>

Contributing

Interested in helping improve MicroMicro? Awesome! Your help is greatly appreciated. See CONTRIBUTING.md for details.

Acknowledgments

MicroMicro wouldn't exist without the hard work of everyone involved in the microformats community. Additionally, the comprehensive microformats test suite was invaluable in the development of this Ruby gem.

MicroMicro is written and maintained by Jason Garber.

License

MicroMicro is freely available under the MIT License. Use it, learn from it, fork it, improve it, change it, tailor it to your needs.