WebAuthor
WebAuthor is a Ruby gem that extracts author information from web pages using multiple strategies. It can detect authors from both meta tags and JSON-LD schema, providing a reliable way to identify content creators.
Features
- Extract author information from HTML meta tags
- Extract author information from JSON-LD schema (schema.org)
- Support for multiple authors in a single page
- Fallback strategy - tries different methods until an author is found
- Clean, type-safe code with Sorbet
Installation
Add this line to your application's Gemfile:
gem 'web-author'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install web-author
Usage
Basic Usage
require 'web_author'
# Create a new Page object with a URL
page = WebAuthor::Page.new(url: 'https://example.com/article')
# Get the author of the page
author = page.author
# => "John Doe"
WebAuthor will first try to find author information in JSON-LD schema data, then fall back to meta tags if needed.
Handling Multiple Authors
If a page has multiple authors in the JSON-LD schema, WebAuthor returns them as a comma-separated string:
page = WebAuthor::Page.new(url: 'https://example.com/collaboration-article')
authors = page.author
# => "Jane Smith, Bob Johnson"
Error Handling
WebAuthor raises WebAuthor::Error
when it encounters problems fetching the page:
begin
page = WebAuthor::Page.new(url: 'https://example.com/article')
author = page.author
rescue WebAuthor::Error => e
puts "Failed to get author: #{e.message}"
end
How It Works
WebAuthor uses a strategy to extract author information:
- First, it tries to find author information in JSON-LD schema (often found in
<script type="application/ld+json">
tags) - If no author is found in JSON-LD, it looks for a meta tag with the name "author" (
<meta name="author" content="Author Name">
) - If no author is found using any strategy, it returns
nil
Supported Author Formats
Meta Tags
<meta name="author" content="Author Name" />
JSON-LD Schema
Single author:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"author": {
"@type": "Person",
"name": "Author Name"
}
}
</script>
Multiple authors:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"author": [
{
"@type": "Person",
"name": "First Author"
},
{
"@type": "Person",
"name": "Second Author"
}
]
}
</script>
Requirements
- Ruby 3.4 or higher
- Nokogiri
- Sorbet Runtime
- Zeitwerk
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake test
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
.
Development Workflow
Type Checking with Sorbet
This project uses Sorbet for static type checking. To run the type checker:
$ bin/type-check
or directly:
$ bundle exec srb tc
Running Tests
Run all tests using:
$ bundle exec rake test
Run a specific test file:
$ bundle exec ruby -Ilib:test test/web_author/page_test.rb
Code Style and Linting
This project follows Ruby style guidelines enforced by RuboCop. Run the linter with:
$ bundle exec rubocop
Auto-fix issues when possible:
$ bundle exec rubocop -A
Running All Checks
The default Rake task runs both tests and RuboCop:
$ bundle exec rake
Working with Sorbet
WebAuthor uses Sorbet for static type checking. When adding new code:
- Add comment on top of the file:
# typed: strict
- Add type signatures to methods using
sig
blocks - Run
bin/type-check
to verify type safety
Example of typed code:
extend T::Sig
sig { params(url: String).void }
def initialize(url:)
@url = T.let(url, String)
@content = T.let(nil, T.nilable(String))
end
sig { returns(T.nilable(String)) }
def author
# method implementation
end
Adding a new strategy
You should create a new class that inherits from WebAuthor::Strategy
and implement the author
method.
You will notice that you will get the document
from the initializer as every strategy receives it. This is a Nokogiri::XML::Document
object.
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
Bug reports and pull requests are welcome on GitHub at https://github.com/lucianghinda/web_author.
License
The gem is available as open source under the terms of the MIT License.