0.01
No commit activity in last 3 years
No release in over 3 years
Lets Users and Bots to create our Sitemap in a organic way.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.9
>= 0
~> 10.0
>= 0
>= 0

Runtime

>= 1.0.0, ~> 1.0
 Project Readme

OrganicSitemap

OrganicSitemap is a gem that gives you a structure to manage your sitemap with healthy urls. It adds a middleware that save on a expiry set on redis all urls served from your server that meet certain criteria

  • status 200
  • Only html pages
  • method GET
  • Don't match with any of skypped_urls Array

Requisites

OrganicSitemap add a middleware on your Rails project.

Uses a Redis connection to save sitemaps urls

Installation

Put this line in your Gemfile:

gem 'organic-sitemap'

Run bundle install

Configuration

To setup this gem you should add your config:

OrganicSitemap.configure do |config|
  
  # Add here all regexp you don't want to add on your sitemap
  # config.skipped_urls = [ "^/sitemap",
  #                         ".txt$",
  #                         ".rss$" ] 

  # OrganicSitemap ignore query_params to identify urls. You can add you allowed params
  # config.allowed_params = [...]

  # By default, all urls are saved on Redis.new(url: 'redis://127.0.0.1:6379'),
  # but you can set you own connection
  # config.redis_connection = Your redis connection

  # url are saved on a set on redis called "sitemap-urls", but if you want you can change it
  # config.storage_key = your key
  
  # By dafault all url have a expiry time in 7 days
  # after this time, if no one load this page, it will be removed from the set.
  # To change it (number of days):
  # config.expiry_time = X

  # Crawler needs a domain to mount urls to visit
  config.crawler_domain = "http://mi.domain.com"

  # By default crawler_delay is 5sec. This is the time between get each url
  # To change it (seconds of delay):
  # config.crawler_delay = x

  # By default, do nothing with urls that not return 200.
  # If you want remove automatically 301 urls from Redis
  # config.clean_redirects = true
  # If you want remove automatically 404 urls from Redis
  # config.clean_not_found = true
end

Crawler and cache Services

If have a front cache service, this services allow you to use your score sitemap to warmup this expire urls

When a page is loaded, score is update with current time and this told as when last cache occurs. Using it we can get all expired urls, and visit they to warm it up.

To configure it:

On initializer define:

  # Crawler needs a domain to mount urls to visit
  config.crawler_domain = "http://mi.domain.com"

  # By default crawler_delay is 5sec. This is the time between get each url
  # To change it (seconds of delay):
  # config.crawler_delay = x

With CacheManager.uncached_urls(expiration_time: CacheExpirationTime, url_pattern: PATTERN) we get all url not hitted on this time (all expired urls)

*** Examples

# Return urls not visited between 1.week.ago(setted on config.expiry_time) and 3.hours.ago
OrganicSitemap::CacheManager.uncached_urls(expiration_time: 3.hours)

# Return urls not visited between 1.week.ago(setted on config.expiry_time) and 3.hours.ago and contains "/test/" string
OrganicSitemap::CacheManager.uncached_urls(expiration_time: 3.hours, url_pattern: "/test/")

# Return urls not visited between 1.week.ago(setted on config.expiry_time) and 3.hours.ago and match ^\/test\/ regexp
OrganicSitemap::CacheManager.uncached_urls(expiration_time: 3.hours, url_pattern: /^\/test\//)

The with CrawlerManager.warmup(urls, opts={}) we visit all this urls. We can set a delay between each page load setting a delay on configuration file. When we visit a url, RedisManager update score for this url and will be no more visited until not expire cache time

*** Examples

# For a 3.hours page cache, get page with user-agent='Ruby'
CrawlerManager.warmup(CacheManager.uncached_urls(expiration_time: 3.hours))

# Get '/test' with user-agent='Crawler-bot'
CrawlerManager.warmup('/test', {headers: {"User-Agent" => 'Crawler-bot'}})

Rails config generator

Copy base config file on your Rails app by

$ rails generator organic_sitemap:config

Contribution

If you have cool idea for improving this Gem or any bug fix just open a pull request and I'll be glad to have a look and merge it if seems fine.

License

This project rocks and uses MIT-LICENSE.