html2rss
is a Ruby gem that generates RSS 2.0 feeds from websites by scraping HTML or JSON content with CSS selectors or auto-detection.
This gem is the core of the html2rss-web application.
๐ Community & Resources
Resource | Description | Link |
---|---|---|
๐ Documentation & Feed Directory | Complete guides, tutorials, and browse 100+ pre-built feeds | html2rss.github.io |
๐ฌ Community Discussions | Get help, share ideas, and connect with other users | GitHub Discussions |
๐ Project Board | Track development progress and upcoming features | View Project Board |
๐ Support Development | Help fund ongoing development and maintenance | Sponsor on GitHub |
Quick Start Options:
- New to RSS? โ Start with the web application
- Ruby Developer? โ Check out the Ruby gem documentation
- Need a specific feed? โ Browse the feed directory
- Want to contribute? โ See our contributing guide
โจ Features
- ๐ฏ CSS Selector Support - Extract content using familiar CSS selectors
- ๐ค Auto-Detection - Automatically detect content using Schema.org and semantic HTML
- ๐ Multiple Request Strategies - Faraday for static sites, Browserless for JS-heavy sites
- ๐ ๏ธ Post-Processing - Template rendering, HTML sanitization, time parsing, and more
- ๐งช Comprehensive Testing - 95%+ test coverage with RSpec
- ๐ Full Documentation - YARD documentation and comprehensive guides
๐ Quick Start
For installation and usage instructions, please visit the project website.
๐ป Try in Browser
You can develop html2rss directly in your browser using GitHub Codespaces:
The Codespace comes pre-configured with Ruby 3.4, all dependencies, and VS Code extensions ready to go!
๐ Documentation
The full documentation for the html2rss
gem is available on the project website.
๐ค Contributing
Please see the contributing guide for details on how to contribute.
๐๏ธ Architecture
Core Components
- Config - Loads and validates configuration (YAML/hash)
- RequestService - Fetches pages using Faraday or Browserless
- Selectors - Extracts content via CSS selectors with extractors/post-processors
- AutoSource - Auto-detects content using Schema.org, semantic HTML, and structural patterns
- RssBuilder - Assembles Article objects and renders RSS 2.0
Data Flow
Config -> Request -> Extraction -> Processing -> Building -> Output
๐งช Testing
- RSpec for comprehensive testing
- 95%+ code coverage with SimpleCov
- VCR for HTTP interaction testing
- RuboCop for code style enforcement
- Reek for code smell detection
๐ง Development Tools
- Ruby LSP for IntelliSense and language features
- Debug for modern debugging and exploration
- YARD for documentation generation
- GitHub Actions for CI/CD
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Sponsoring
If you find html2rss
useful, please consider sponsoring the project.