Jekyll-AEO
Answer Engine Optimization for Jekyll. Generates clean markdown copies of every page and produces llms.txt / llms-full.txt index files, following the llms.txt spec.
Installation
Add to your Jekyll site's Gemfile:
gem "jekyll-aeo"Then run:
bundle installQuick Start
That's it. With zero configuration, jekyll-aeo will:
- Generate a
.mdcompanion for every HTML page (e.g.,/about/index.html→/about.md) - Generate
/llms.txtwith an index of all pages - Generate
/llms-full.txtwith all page content concatenated - Inject
<link rel="alternate" type="text/markdown">tags into every HTML page
Run bundle exec jekyll build and check your output directory.
Configuration
All settings are optional. Add to _config.yml. Configuration uses a strict schema — only keys defined in the plugin are accepted; typos or unknown keys are silently dropped.
jekyll_aeo:
enabled: true # master switch; when false, all generation stops (default: true)
exclude: # URL prefixes to skip
- /privacy/
- /error/
dotmd:
link_tag: "auto" # "auto", "data", or omit to disable (default: "auto")
include_last_modified: true # add "Last updated" date to .md files (default: true)
dotmd_metadata: false # add YAML front matter block to .md files (default: false)
md2dotmd: # source markdown → .md settings
strip_block_tags: true # strip comment/capture block content (default: true)
protect_indented_code: false # protect 4-space indented code blocks (default: false)
html2dotmd: # rendered HTML → .md settings (for plugin-generated pages)
enabled: false # convert rendered HTML to markdown (default: false)
selector: null # CSS selector (default: null — auto-detects main > article > body)
llms_txt:
enabled: true # generate llms.txt (default: true)
description: "" # override site description in llms.txt
include_descriptions: true # include page descriptions in llms.txt entries (default: true)
front_matter_keys: [] # (reserved, not implemented)
show_lastmod: false # (reserved, not implemented)
sections: # custom sections (auto-generated if omitted)
- title: "Pages"
collection: "pages"
- title: "Products"
collection: "products"
- title: "Blog Posts"
collection: "posts"
- title: "Optional"
collection: "profiles"
llms_full_txt:
enabled: true # generate llms-full.txt (default: true)
description: "" # override description in llms-full.txt header
full_txt_mode: "all" # "all" or "linked" (default: "all")
url_map:
enabled: false # generate URL map markdown table (default: false)
output_filepath: "docs/Url-Map.md" # output path relative to project root (default: "docs/Url-Map.md")
show_created_at: true # show generation timestamp in document header (default: true)
columns: # columns to include in the table
- layout
- url
- url_dotmd
- dotmd_mode
- excluded
- path
- page_id
- lang
- redirects
robots_txt:
enabled: false # generate robots.txt with crawler policy (default: false)
allow: # search/retrieval bots to allow
- Googlebot
- Bingbot
- OAI-SearchBot
- ChatGPT-User
- Claude-SearchBot
- Claude-User
- PerplexityBot
- Applebot-Extended
disallow: # training bots to block
- GPTBot
- ClaudeBot
- Google-Extended
- Meta-ExternalAgent
- Amazonbot
include_sitemap: true # add Sitemap: directive (default: true)
include_llms_txt: true # add Llms-txt: directive (default: true)
custom_rules: [] # additional bot-specific rules
domain_profile:
enabled: false # generate /.well-known/domain-profile.json (default: false)
name: null # falls back to site.title or site.name
description: null # falls back to site.description
website: null # falls back to site.url
contact: null # REQUIRED — email or URL, no fallback
logo: null # URL to logo image
entity_type: null # Organization, Person, Blog, NGO, Community, Project, etc.
jsonld: null # custom JSON-LD hash to includePer-Page Options
Control .md generation per page via dotmd_mode front matter:
---
title: Secret Page
dotmd_mode: disabled # skip .md generation for this page
------
title: Blog
dotmd_mode: html2dotmd # force HTML-to-markdown conversion
---Available values: auto (default), md2dotmd, html2dotmd, disabled.
See lib/jekyll-aeo/generators/README.md for the full decision logic.
Pages with redirect_to in front matter are automatically skipped.
How It Works
Per-Page Markdown Generation
For every HTML page Jekyll renders, the plugin:
- Re-reads the original source file from disk
- Strips YAML front matter
- Strips Liquid tags (
{% %}and{{ }}) outside fenced code blocks; content inside{% raw %}…{% endraw %}is preserved (tags stripped, inner content kept) - Strips kramdown attribute annotations (
{: .class},{:width="300"}) - Prepends the page title as an H1 header (if not already present)
- Adds the page description as a blockquote (if present)
- Writes the result as a
.mdfile alongside the HTML output (for the root index, also writesindex.html.mdwith the same content)
html2dotmd (HTML to Markdown)
Pages generated by Jekyll plugins (e.g., jekyll-paginate, jekyll-archives) have no source file on disk and are normally skipped. With html2dotmd.enabled: true, the plugin converts their rendered HTML output to markdown instead:
jekyll_aeo:
dotmd:
html2dotmd:
enabled: true
selector: null # optional CSS selectorThe converter automatically extracts the main content area (<main>, then <article>, then <body>) and strips layout chrome (<script>, <style>, <nav>, <header>, <footer>). Set selector to a CSS selector (e.g., ".content", "#main") to target a specific element.
Baseurl Support
If your Jekyll site runs under a subpath (e.g., baseurl: /docs), all links in llms.txt will include the prefix automatically: /docs/about.md instead of /about.md. No additional configuration needed.
Link Tag
By default, jekyll-aeo injects a <link> tag into the <head> of every HTML page pointing to its markdown copy:
<link rel="alternate" type="text/markdown" href="/about.md">This helps AI crawlers discover the machine-readable version of each page.
The link_tag setting controls this behavior:
| Value | Behavior |
|---|---|
"auto" (default) |
Injects the <link> tag before </head> automatically |
"data" |
Sets page.md_url and page.md_link_tag in page data for use in templates |
| omitted / falsy | Disabled |
With link_tag: "data", you can place the tag manually in your layout:
{{ page.md_link_tag }}Or use the URL directly:
<link rel="alternate" type="text/markdown" href="{{ page.md_url }}">llms.txt
Generated at the site root with:
- H1: site title
- Blockquote: site description
- Link to
llms-full.txtfor complete content - H2 sections grouping pages by collection, with links to
.mdfiles
llms-full.txt
All individual .md file contents concatenated, separated by ---.
-
full_txt_mode: "all"(default): includes every eligible page -
full_txt_mode: "linked": only includes pages that appear in llms.txt sections
URL Map
Generate a markdown table of all HTML pages with metadata. Disabled by default.
jekyll_aeo:
url_map:
enabled: trueThis writes a docs/Url-Map.md file (configurable via output_filepath) relative to your project root (the directory containing _config.yml when running Jekyll, or the source directory otherwise) — useful as a development reference that can be committed to version control.
The table is grouped by collection (Pages first, then alphabetically) with configurable columns:
| Column | Description |
|---|---|
layout |
Layout name |
url |
Page URL |
url_dotmd |
Path to the generated .md file |
dotmd_mode |
Converter used: html2dotmd or md2dotmd
|
excluded |
Reason the page was excluded (if any) |
path |
Relative source file path |
page_id |
Value of page_id from front matter |
lang |
Value of lang from front matter |
redirects |
Values from redirect_from front matter |
Domain Profile
Generate a /.well-known/domain-profile.json file following the AI Domain Data spec (v0.1). This provides AI assistants with authoritative identity metadata about your site. Disabled by default.
jekyll_aeo:
domain_profile:
enabled: true
contact: "hello@example.com" # required
entity_type: "Organization" # optionalThe contact field is required — generation is skipped with a warning if not set. The name, description, and website fields fall back to site.title/site.name, site.description, and site.url respectively.
Valid entity_type values: Organization, Person, Blog, NGO, Community, Project, CreativeWork, SoftwareApplication, Thing.
You can include a custom JSON-LD object via the jsonld key:
jekyll_aeo:
domain_profile:
enabled: true
contact: "hello@example.com"
jsonld:
"@type": "Organization"
name: "Example Corp"robots.txt
Generate a robots.txt that separates search bots (allowed) from training bots (blocked). Disabled by default.
jekyll_aeo:
robots_txt:
enabled: trueDefault behavior: allows search bots (Googlebot, Bingbot, OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Applebot-Extended) and blocks training bots (GPTBot, ClaudeBot, Google-Extended, Meta-ExternalAgent, Amazonbot). Includes Sitemap: and Llms-txt: directives automatically.
If you already have a robots.txt file in your source directory, the generator skips and uses yours instead. Integrates with jekyll-sitemap — no conflicts.
Add custom rules for specific bots:
jekyll_aeo:
robots_txt:
enabled: true
custom_rules:
- user_agent: "SpecialBot"
allow: "/public/"
disallow: "/private/"JSON-LD Schema ({% aeo_json_ld %})
Add the {% aeo_json_ld %} Liquid tag to your layout to output structured data as <script type="application/ld+json"> blocks:
<head>
...
{% aeo_json_ld %}
</head>The tag automatically renders JSON-LD for 6 schema types based on your page's front matter and site config:
| Schema | Trigger | Auto? |
|---|---|---|
| BreadcrumbList | URL path (every page except homepage) | Yes |
| Organization | Homepage, when site.title or site.name is set |
Yes |
| FAQPage |
faq: array in front matter |
No (front matter) |
| HowTo |
howto: object in front matter |
No (front matter) |
| Speakable |
speakable: true in front matter |
No (front matter) |
| Article | Page has date and jekyll-seo-tag is NOT installed |
Auto (skips when seo-tag present) |
FAQPage
Add a faq: array with q: and a: pairs to your front matter:
---
title: FAQ
faq:
- q: "What is Jekyll-AEO?"
a: "A Ruby gem for Answer Engine Optimization."
- q: "Does it work with jekyll-seo-tag?"
a: "Yes, they cover different layers and don't conflict."
---HowTo
Add a howto: object with steps: to your front matter:
---
title: How to Install
howto:
name: "Install Jekyll-AEO"
description: "Steps to add Jekyll-AEO to your site"
totalTime: "PT5M"
steps:
- name: "Add to Gemfile"
text: "Add gem 'jekyll-aeo' to your Gemfile"
- name: "Install"
text: "Run bundle install"
- name: "Build"
text: "Run bundle exec jekyll build"
---Speakable
Add speakable: true to mark a page's title and first paragraph as voice-assistant-friendly:
---
title: About Us
speakable: true
---jekyll-seo-tag Compatibility
The Article schema automatically skips when jekyll-seo-tag is installed, since seo-tag already outputs BlogPosting (a subtype of Article). All other schema types (FAQPage, HowTo, BreadcrumbList, Organization, Speakable) are always safe — they output different types that don't conflict. Multiple <script type="application/ld+json"> blocks per page are valid per the JSON-LD spec.
strip_block_tags
Controls how Liquid comment and capture blocks are handled.
With strip_block_tags: true (default):
Source:
{% comment %}
Internal note for editors.
{% endcomment %}
Welcome to the page.Output .md:
Welcome to the page.Comment and capture blocks are fully removed (tags + content), since they contain developer metadata, not page content.
With strip_block_tags: false:
The same source would produce:
Internal note for editors.
Welcome to the page.Note: {% if %}, {% for %}, {% unless %}, and {% case %} blocks always preserve their content regardless of this setting, since the content between them is real page content.
protect_indented_code
By default, only fenced code blocks (``` and ~~~) are protected from Liquid/kramdown stripping. Enable protect_indented_code: true to also protect indented code blocks (4+ spaces after a blank line).
Recommendation: use fenced code blocks for code examples that contain Liquid syntax.
Custom Sections
By default, llms.txt auto-generates sections by grouping pages by their Jekyll collection. To customize:
jekyll_aeo:
llms_txt:
sections:
- title: "Documentation"
collection: "pages"
- title: "Products"
collection: "products"
- title: "Optional" # LLMs can skip this section per spec
collection: "profiles"Use collection: null to match standalone pages (those not in any collection).
Validation
After building your site, verify AEO output with:
bundle exec jekyll aeo:validateThis checks:
-
llms.txtexists and starts with an H1 heading -
llms-full.txt(if present) is non-empty - All
.mdfiles referenced inllms.txtexist in the destination directory -
domain-profile.json(if present): valid JSON, required fields (spec,name,description,website,contact), and validentity_type(invalid values emit a warning, not an error)
Respects baseurl when resolving file paths.
Skipped Content
The following are automatically skipped (in order):
- Plugin disabled (
enabled: false) - Non-HTML outputs (CSS, JS, etc.)
- Pages with
dotmd_mode: disabledin front matter - Redirect pages (
redirect_toin front matter) - Documents in the
assetscollection -
llms.txtandllms-full.txtfiles - Paths matching
excludeprefixes - Pages with no source file on disk (unless html2dotmd is enabled)
Development
bundle installRun unit tests:
rake testRun all tests (unit + integration — builds the example site):
rakeBuild the example site standalone:
rake site:buildServe the example site locally:
rake site:serveThe integration tests build a full Jekyll site from demo/example.com/ and verify all generated outputs (llms.txt, robots.txt, domain-profile.json, markdown copies, JSON-LD schemas, link tags).
License
MIT. Copyright (c) 2026 ZAAI.