WebInspector
Ruby gem to inspect web pages. It scrapes a given URL and returns its title, description, meta tags, links, images, and more.
Installation
Add this line to your application's Gemfile:
gem 'webinspector'And then execute:
$ bundle
Or install it yourself as:
$ gem install webinspector
Usage
Initialize a WebInspector instance
page = WebInspector.new('http://example.com')With options
page = WebInspector.new('http://example.com', {
timeout: 30, # Request timeout in seconds (default: 30)
retries: 3, # Number of retries (default: 3)
headers: {'User-Agent': 'Custom UA'} # Custom HTTP headers
})Accessing response status and headers
page.response.status # 200
page.response.headers # { "server"=>"apache", "content-type"=>"text/html; charset=utf-8", ... }
page.status_code # 200
page.success? # true if the page was loaded successfully
page.error_message # returns the error message if anyAccessing page data
page.url # URL of the page
page.scheme # Scheme of the page (http, https)
page.host # Hostname of the page (like, example.com, without the scheme)
page.port # Port of the page
page.title # title of the page from the head section
page.description # description of the page
page.links # array of all links found on the page (absolute URLs)
page.images # array of all images found on the page (absolute URLs)
page.meta # meta tags of the page
page.favicon # favicon URL if availableWorking with meta tags
page.meta # all meta tags
page.meta['description'] # meta description
page.meta['keywords'] # meta keywords
page.meta['og:title'] # OpenGraph titleFiltering links and images by domain
page.domain_links('example.com') # returns only links pointing to example.com
page.domain_images('example.com') # returns only images hosted on example.comSearching for words
page.find(["ruby", "rails"]) # returns [{"ruby"=>3}, {"rails"=>1}]JavaScript and Stylesheets
page.javascripts # array of all JavaScript files (absolute URLs)
page.stylesheets # array of all CSS stylesheets (absolute URLs)Language Detection
page.language # detected language code (e.g., "en", "es", "fr")Structured Data
page.structured_data # array of JSON-LD structured data objects
page.microdata # array of microdata items
page.json_ld # alias for structured_dataSecurity Information
page.security_info # hash with security details: { secure: true, hsts: true, ... }Performance Metrics
page.load_time # page load time in seconds
page.size # page size in bytesContent Type
page.content_type # content type header (e.g., "text/html; charset=utf-8")Technology Detection
page.technologies # hash of detected technologies: { jquery: true, react: true, ... }HTML Tag Statistics
page.tag_count # hash with counts of each HTML tag: { "div" => 45, "p" => 12, ... }RSS/Atom Feeds
page.feeds # array of RSS/Atom feed URLs found on the pageSocial Media Links
page.social_links # hash of social media profiles: { facebook: "url", twitter: "url", ... }Robots.txt and Sitemap
page.robots_txt_url # URL to robots.txt
page.sitemap_url # array of sitemap URLsCMS Detection
page.cms_info # hash with CMS details: { name: "WordPress", version: "6.0", themes: [...], plugins: [...] }Accessibility Score
page.accessibility_score # hash with score (0-100) and details: { score: 85, details: [...] }Mobile-Friendly Check
page.mobile_friendly? # true if the page has viewport meta tag and responsive CSSExport all data to JSON
page.to_hash # returns a hash with all page dataChangelog
Version 1.2.0
New Features:
- RSS/Atom feed detection with
feedsmethod - Social media profile extraction with
social_linksmethod - CMS detection and information with
cms_infomethod (WordPress, Drupal, Joomla, Shopify, Wix, Squarespace) - Accessibility scoring with
accessibility_scoremethod - Mobile-friendly detection with
mobile_friendly?method - Robots.txt and sitemap URL detection with
robots_txt_urlandsitemap_urlmethods
Improvements:
- Enhanced
Requestmodule withvalid?andssl?methods for better URL validation - Improved
Metamodule with author and publisher extraction - Better error handling across all modules
- Performance improvements with internal caching
Contributors
- Steven Shelby (@stevenshelby)
- Sam Nissen (@samnissen)
License
The WebInspector gem is released under the MIT License.
Contributing
- Fork it ( https://github.com/davidesantangelo/webinspector/fork )
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create a new Pull Request