SiteMaps
SiteMaps is a gem that helps you to generate sitemaps for your Rails application.
Installation
Add this line to your application's Gemfile:
gem 'site_maps'And then execute:
bundle installOr install it yourself as:
gem install site_mapsUsage
Create a configuration file where you will define the sitemap logic. You can use the following DSL to define the sitemap generation. Below is the minimum configuration required to generate a sitemap:
# config/sitemap.rb
SiteMaps.use(:file_system) do
configure do |config|
config.url = "https://example.com/sitemaps/sitemap.xml.gz" # Location of main sitemap index file
config.directory = "/home/www/public"
end
process do |s|
s.add('/', priority: 1.0, changefreq: "daily")
s.add('/about', priority: 0.9, changefreq: "weekly")
end
endAfter creating the configuration file, you can run the following command to generate the sitemap:
SiteMaps.generate(config_file: "config/sitemap.rb")
.enqueue_all
.runor you can use the CLI to generate the sitemap:
bundle exec site_maps generate --config-file config/sitemap.rbConfiguration
Configuration can be defined using the configure block or by passing the configuration options to the use method. Each adapter may have specific configuration options, but the following options are common to all adapters:
-
url- URL of the main sitemap index file. This URL must ends with.xmlor.xml.gz. -
directory- Directory where the sitemap files will be stored.
Configuration using the #configure block
SiteMaps.use(:file_system) do
configure do |config|
config.url = "https://example.com/sitemaps/sitemap.xml.gz"
config.directory = "/home/www/public"
end
# define sitemap processes..
endConfiguration using #config method
SiteMaps.use(:file_system) do
config.url = "https://example.com/sitemaps/sitemap.xml.gz"
config.directory = "/home/www/public"
# define sitemap processes..
endConfiguration passing options to the #use method
SiteMaps.use(:file_system, url: "https://example.com/sitemaps/sitemap.xml.gz", directory: "/home/www/public") do
# define sitemap processes..
endRefer to the adapter documentation to see the specific configuration options.
Gzip Compression
The sitemap files can be automatically compressed using the gzip algorithm. To enable the gzip compression, just pass the sitemap url with the .gz extension.
# config/sitemap.rb
SiteMaps.use(:file_system) do
configure do |config|
config.url = "https://example.com/sitemaps/sitemap.xml.gz" # Location of main sitemap index file
config.directory = "/home/www/public"
end
process do |s|
# Add sitemap links
end
endSitemap Index
For small websites, you can use a single sitemap file to store all the links. However, for large websites with thousands of links, you should use a sitemap index file to store the sitemap links. This library will automatically generate the sitemap index file if you define multiple processes or if the amount of links exceeds the maximum limit of links or file size.
Criteria to generate the sitemap index file:
- Multiple processes defined in the configuration file.
- The amount of links exceeds the maximum limit of links (50,000 links).
- The amount of news links exceeds the maximum limit of news links (1,000 links).
- The uncompressed file size exceeds the maximum limit of file size (50MB).
Static and Dynamic Processes
Sitemap links are defined in the process block because the gem is designed to generate sitemaps for large websites in parallel. It means that each process will be executed in a separate thread, which will improve the performance of the sitemap generation.
Each process can have a unique name and a unique sitemap file location. By omitting the name and the file location, the process will use the :default value.
Bellow is an example of a configuration file with multiple processes:
# config/sitemap.rb
SiteMaps.use(:file_system) do
configure do |config|
config.url = "https://example.com/sitemaps/sitemap.xml" # Location of main sitemap index file
config.directory = "/home/www/public"
end
# Static Processes
process do |s|
s.add('/', priority: 1.0, changefreq: "daily")
s.add('/about', priority: 0.9, changefreq: "weekly")
end
process :categories, "categories/sitemap.xml" do |s|
Category.find_each do |category|
s.add(category_path(category), priority: 0.7)
end
end
# Dynamic Processes
process :posts, "posts/%{year}-%{month}/sitemap.xml", year: Date.today.year, month: Date.today.month do |s, year:, month:|
Post.where(year: year.to_i, month: month.to_i).find_each do |post|
s.add(post_path(post), priority: 0.8)
end
end
endDynamic process are defined by passing a process name, a location, and a list of extra arguments that will be dinamically replaced by the given values in the enqueue method.
Location can contain placeholders that will be replaced by the values passed to the process block(The %{year} and %{month} of example bellow). Both relative and absolute paths are supported. Note that when using relative paths, the base dir of main sitemap index file will be used as the root directory.
It will let you enqueue the same process multiple times with different values.
SiteMaps.generate(config_file: "config/sitemap.rb")
.enqueue(:posts, year: "2021", month: "01")
.enqueue(:posts, year: "2021", month: "02")
.enqueue_remaining # Enqueue all remaining processes (default and categories)
.runImportant Considerations:
- The values of the extra arguments may be strings when they are coming from the CLI or other sources.
- By omitting the extra arguments, the process will be enqueued with the default values defined in the configuration file. So make sure you define default values or properly add nil checks in the process block to avoid errors.
Sitemap Extensions
The sitemap builder supports the following sitemap extensions:
You can add the sitemap links with the extensions by passing a hash with the extension name as the key and the extension attributes as the value.
Image
Images can be added to the sitemap links by passing images attributes to the add method. The images attribute should be an array of hashes with the image attributes.
Check out the Google specification here.
config = { ... }
SiteMaps.use(:file_system, **config) do
process do |s|
s.add(
'/',
priority: 1.0,
changefreq: "daily",
images: [
{ loc: "https://example.com/image.jpg" }
],
)
end
endSupported attributes:
-
loc- URL of the image. -
caption- Image caption. -
geo_location- Image geo location. -
title- Image title. -
license- Image license.
Video
Videos can be added to the sitemap links by passing videos attributes to the add method. The videos attribute should be an array of hashes with the video attributes.
Check out the Google specification here.
config = { ... }
SiteMaps.use(:file_system, **config) do
process do |s|
s.add(
'/',
priority: 1.0,
changefreq: "daily",
videos: [
{
thumbnail_loc: "https://example.com/thumbnail.jpg",
title: "Video Title",
description: "Video Description",
content_loc: "https://example.com/video.mp4",
player_loc: "https://example.com/player.swf",
allow_embed: "yes",
autoplay: "ap=1",
# ...
}
],
)
end
endSupported attributes:
-
thumbnail_loc- URL of the thumbnail image. -
title- Title of the video. -
description- Description of the video. -
content_loc- URL of the video content. -
player_loc- URL of the video player. -
allow_embed- Allow embed attribute of the player location. -
autoplay- Autoplay attribute of the player location. -
duration- Duration of the video in seconds. -
expiration_date- Expiration date of the video. -
rating- Rating of the video. -
view_count- View count of the video. -
publication_date- Publication date of the video. -
tags- Tags of the video. -
tag- Single tag of the video. -
category- Category of the video. -
family_friendly- Family friendly attribute of the video. -
gallery_loc- URL of the video gallery. -
gallery_title- Title of the video gallery. -
uploader- Uploader of the video. -
uploader_info- Uploader info of the video. -
price- Price of the video. -
price_currency- Currency of the video price. -
price_type- Type of the video price. -
price_resolution- Resolution of the video price. -
live- Live attribute of the video. -
requires_subscription- Requires subscription attribute of the video.
PageMap
PageMap sitemaps can be added to the sitemap links by passing pagemap attributes to the add method. The pagemap attribute should be a hash with the pagemap attributes.
Check out the Google specification here.
config = { ... }
SiteMaps.use(:file_system, **config) do
process do |s|
s.add(
'/',
priority: 1.0,
changefreq: "daily",
pagemap: {
dataobjects: [
{
type: "document",
id: "1",
attributes: [
{ name: "title", value: "Page Title" },
{ name: "description", value: "Page Description" },
{ name: "url", value: "https://example.com" },
]
}
]
}
)
end
endSupported attributes:
-
dataobjects- Array of hashes with the data objects.-
type- Type of the object. -
id- ID of the object. -
attributes- Array of hashes with the attributes.-
name- Name of the attribute. -
value- Value of the attribute.
-
-
News
News sitemaps can be added to the sitemap links by passing news attributes to the add method. The news attribute should be a hash with the news attributes.
Check out the Google specification here.
config = { ... }
SiteMaps.use(:file_system, **config) do
process do |s|
s.add(
'/',
priority: 1.0,
changefreq: "daily",
news: {
publication_name: "Publication Name",
publication_language: "en",
publication_date: Time.now,
genres: "PressRelease",
access: "Subscription",
title: "News Title",
keywords: "News Keywords",
stock_tickers: "NASDAQ:GOOG",
}
)
end
endSupported attributes:
-
publication_name- Name of the publication. -
publication_language- Language of the publication. -
publication_date- Publication date of the news. -
genres- Genres of the news. -
access- Access of the news. -
title- Title of the news. -
keywords- Keywords of the news. -
stock_tickers- Stock tickers of the news.
Alternates
You can add alternate links to the sitemap links by passing alternates attributes to the add method. The alternates attribute should be an array of hashes with the alternate attributes.
Check out the Google specification here.
config = { ... }
SiteMaps.use(:file_system, **config) do
process do |s|
s.add(
'/',
priority: 1.0,
changefreq: "daily",
alternates: [
{ href: "https://example.com/en", lang: "en" },
{ href: "https://example.com/es", lang: "es" },
],
)
end
endSupported attributes:
-
href- URL of the alternate link. (Required) -
lang- Language of the alternate link. (Optional) -
nofollow- Nofollow attribute of the alternate link. (Optional) -
media- Media targets for responsive design pages. (Optional)
Mobile
Mobile sitemaps include a specific mobile:mobile/ tag.
Check out the Google specification here.
config = { ... }
SiteMaps.use(:file_system, **config) do
process do |s|
s.add('/', mobile: true)
end
endSupported attributes:
-
mobile- Mobile attribute of the sitemap link.
Adapters
The gem provides adapters to store the sitemaps in different locations. The following adapters are available:
- File System
- AWS S3
File System
You can use the file system adapter to store the sitemaps in the file system. The configuration is simple, you just need to provide the directory where the sitemaps will be stored.
SiteMaps.use(:file_system) do
configure do |config|
config.url = "https://example.com/sitemaps/sitemap.xml.gz"
config.directory = "/home/www/public"
end
process do |s|
# Add sitemap links
end
endAWS S3
You can use the AWS S3 adapter to store the sitemaps in an S3 bucket. The configuration is similar to the file system adapter, but you need to provide the AWS SDK options.
SiteMaps.use(:aws_sdk) do
configure do |config|
config.url = "https://my-bucket.s3.amazonaws.com/sitemaps/sitemap.xml"
config.directory = "/tmp" # Local directory to store the sitemaps before uploading to S3
# AWS S3 specific options
config.bucket = "my-bucket"
config.region = "us-east-1"
config.aws_access_key = ENV["AWS_ACCESS_KEY_ID"]
config.aws_secret_key = ENV["AWS_SECRET_ACCESS_KEY"]
# Optional parameters (default values)
config.acl = "public-read"
config.cache_control = "private, max-age=0, no-cache"
end
process do |s|
# Add sitemap links
end
endIf you want to let your rails application as a proxy to the sitemap files, you can create a controller to serve the sitemap files from the S3 bucket.
# config/routes.rb
get "sitemaps/*relative_path", to: "sitemaps#show", as: :sitemap# app/controllers/sitemaps_controller.rb
class SitemapsController < ApplicationController
def show
location = params.permit("relative_path", "format").to_h.values.join(".")
unless location =~ /\.xml(\.gz)?$/ # You may want add more validations here
raise ActionController::RoutingError, "Not found"
end
data, meta = SiteMaps.current_adapter.read(File.join("sitemaps", location))
if location.ends_with?(".xml")
render xml: data
else
send_data(data, disposition: "attachment", type: meta[:content_type])
end
rescue SiteMaps::FileNotFoundError
raise ActionController::RoutingError, "Not found"
end
endMake sure to let sitemap config in the initializer. You may want to add some caching to avoid hitting the S3 bucket on every request.
Custom Adapters
You can create custom adapters to store the sitemaps in different locations. You just need to create a class that implements the SiteMaps::Adapters::Adapter interface. The adapter should implement the following methods:
-
write(url, raw_data, **extra)- Write the sitemap data to the storage. -
read(url)- Read the sitemap data from the storage. -
delete(url)- Delete the sitemap data from the storage.
class MyAdapter < SiteMaps::Adapters::Adapter
def write(url, raw_data, **extra)
# Write the sitemap data to the storage
end
def read(url)
# Read the sitemap data from the storage
end
def delete(url)
# Delete the sitemap data from the storage
end
end
SiteMaps.use(MyAdapter, **config) do
process do |s|
# Add sitemap links
end
endAdapter Configuration
If you adapter requires additional configuration, you can define a <adapter class>::Config inheriting from SiteMaps::Configuration and implement the required configuration options.
class MyAdapter::Config < SiteMaps::Configuration
attribute :api_key, default: -> { ENV["MY_API_KEY"] }
endDuring the adapter initialization, it will automatically detect the configuration class and use it to load the configuration options.
SiteMaps.use(MyAdapter) do
configure do |config|
# ...
config.api_key = "my-api-key"
end
process do |s|
# Add sitemap links
end
endCLI
You can use the CLI to generate the sitemap. The CLI will load the configuration file and run the sitemap generation.
bundle exec site_maps generate --config-file config/sitemap.rbTo enqueue dynamic processes, you can pass the process name with the context values.
bundle exec site_maps generate monthly_posts \
--config-file config/sitemap.rb \
--context=year:2021 month:1Enqueue dynamic + remaining processes
bundle exec site_maps generate monthly_posts \
--config-file config/sitemap.rb \
--context=year:2021 month:1 \
--enqueue-remainingpassing max threads to run the processes in parallel
bundle exec site_maps generate \
--config-file config/sitemap.rb \
--max-threads 10Notification
You can subscribe to the internal events to receive notifications about the sitemap generation. The following events are available:
-
sitemaps.enqueue_process- Triggered when a process is enqueued. -
sitemaps.before_process_execution- Triggered before a process starts execution -
sitemaps.process_execution- Triggered when a process finishes execution. -
sitemaps.finalize_urlset- Triggered when the sitemap builder finishes the URL set.
You can subscribe to the events using the following code:
SiteMaps::Notification.subscribe("sitemaps.enqueue_process") do |event|
puts "Enqueueing process #{event.payload[:name]}"
endWe have the standard event handler SiteMaps::Runner::EventListener that will print the events to the standard output. You can use it to view the progress of the sitemap generation.
SiteMaps::Notification.subscribe(SiteMaps::Runner::EventListener)
SiteMaps.generate(config_file: "config/sitemap.rb")
.enqueue_all
.runMixins
You can use mixins to extend the sitemap builder with additional methods. The mixins can be used to define common methods that will be used in multiple processes.
module MyMixin
def repository
Repository.new
end
def post_path(post)
"/posts/#{post.slug}"
end
end
SiteMaps.use(:file_system) do
extend_processes_with(MyMixin)
process do |s|
s.repository.posts.each do |post|
s.add(s.post_path(post), priority: 0.8)
end
end
endWe already have a built-in mixin for Rails applications that provides the url helpers through the route method.
process do |s|
s.add(s.route.root_path, priority: 1.0)
s.add(s.route.about_path, priority: 0.9)
end
endDevelopment
After checking out the repo, run bin/setup to install dependencies. You can also run bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/marcosgz/site_maps.
License
The gem is available as open source under the terms of the MIT License.