Project

tree_haver

0.0
No release in over 3 years
🌴 TreeHaver is a cross-Ruby adapter for the tree-sitter parsing library that works seamlessly across MRI Ruby, JRuby, and TruffleRuby. It provides a unified API for parsing source code using tree-sitter grammars, regardless of your Ruby implementation. Like Faraday for HTTP or multi_json for JSON, TreeHaver lets you write once and run anywhere with automatic backend selection (MRI C extensions, Rust extensions, FFI, or Java).
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

~> 1.0, >= 1.0.3
~> 1.0, >= 1.0.6
~> 13.0
~> 1.0, >= 1.0.4
~> 1.0, >= 1.0.3

Runtime

~> 1.1, >= 1.1.9
 Project Readme
📍 NOTE
RubyGems (the GitHub org, not the website) suffered a hostile takeover in September 2025.
Ultimately 4 maintainers were hard removed and a reason has been given for only 1 of those, while 2 others resigned in protest.
It is a complicated story which is difficult to parse quickly.
Simply put - there was active policy for adding or removing maintainers/owners of rubygems and bundler, and those policies were not followed.
I'm adding notes like this to gems because I don't condone theft of repositories or gems from their rightful owners.
If a similar theft happened with my repos/gems, I'd hope some would stand up for me.
Disenfranchised former-maintainers have started gem.coop.
Once available I will publish there exclusively; unless RubyCentral makes amends with the community.
The "Technology for Humans: Joel Draper" podcast episode by reinteractive is the most cogent summary I'm aware of.
See here, here and here for more info on what comes next.
What I'm doing: A (WIP) proposal for bundler/gem scopes, and a (WIP) proposal for a federated gem server.

Galtzo FLOSS Logo by Aboling0, CC BY-SA 4.0 ruby-lang Logo, Yukihiro Matsumoto, Ruby Visual Identity Team, CC BY-SA 2.5 kettle-rb Logo by Aboling0, CC BY-SA 4.0

🌴 TreeHaver

Version GitHub tag (latest SemVer) License: MIT Downloads Rank Open Source Helpers CodeCov Test Coverage Coveralls Test Coverage QLTY Test Coverage QLTY Maintainability CI Heads CI Runtime Dependencies @ HEAD CI Current CI Truffle Ruby Deps Locked Deps Unlocked CI Supported CI Test Coverage CI Style CodeQL Apache SkyWalking Eyes License Compatibility Check

if ci_badges.map(&:color).detect { it != "green"} ☝️ let me know, as I may have missed the discord notification.


if ci_badges.map(&:color).all? { it == "green"} 👇️ send money so I can do more of this. FLOSS maintenance is now my full-time job.

OpenCollective Backers OpenCollective Sponsors Sponsor Me on Github Liberapay Goal Progress Donate on PayPal Buy me a coffee Donate on Polar Donate at ko-fi.com

🌻 Synopsis

TreeHaver is a cross-Ruby adapter for the tree-sitter parsing library that works seamlessly across MRI Ruby, JRuby, and TruffleRuby. It provides a unified API for parsing source code using tree-sitter grammars, regardless of your Ruby implementation.

The Adapter Pattern: Like Faraday, but for Parsing

If you've used Faraday, multi_json, or multi_xml, you'll feel right at home with TreeHaver. These gems share a common philosophy:

Gem Unified API for Backend Examples
Faraday HTTP requests Net::HTTP, Typhoeus, Patron, Excon
multi_json JSON parsing Oj, Yajl, JSON gem
multi_xml XML parsing Nokogiri, LibXML, Ox
TreeHaver tree-sitter parsing ruby_tree_sitter, tree_stump, FFI, Java JARs, Citrus

Write once, run anywhere.

Learn once, write anywhere.

Just as Faraday lets you swap HTTP adapters without changing your code, TreeHaver lets you swap tree-sitter backends. Your parsing code remains the same whether you're running on MRI with native C extensions, JRuby with FFI, or TruffleRuby.

# Your code stays the same regardless of backend
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Language.from_library("/path/to/grammar.so")
tree = parser.parse(source_code)

# TreeHaver automatically picks the best backend:
# - MRI → ruby_tree_sitter (C extensions)
# - JRuby → FFI (system's libtree-sitter)
# - TruffleRuby → FFI or MRI backend

Key Features

  • Universal Ruby Support: Works on MRI Ruby, JRuby, and TruffleRuby
  • Multiple Backends:
    • MRI Backend: Leverages the excellent ruby_tree_sitter gem (C extension)
    • Rust Backend: Uses tree_stump gem (Rust extension with precompiled binaries)
    • FFI Backend: Pure Ruby FFI bindings to libtree-sitter (ideal for JRuby)
    • Java Backend: Support for JRuby's native Java integration, and native java-tree-sitter grammar JARs
    • Citrus Backend: Pure Ruby parser using citrus gem (no native dependencies, portable)
  • Automatic Backend Selection: Intelligently selects the best backend for your Ruby implementation
  • Language Agnostic: Load any tree-sitter grammar dynamically (TOML, JSON, Ruby, JavaScript, etc.)
  • Grammar Discovery: Built-in GrammarFinder utility for platform-aware grammar library discovery
  • Thread-Safe: Built-in language registry with thread-safe caching
  • Minimal API Surface: Simple, focused API that covers the most common tree-sitter use cases

Backend Requirements

TreeHaver has minimal dependencies and automatically selects the best backend for your Ruby implementation. Each backend has specific version requirements:

MRI Backend (ruby_tree_sitter, C extensions)

Requires ruby_tree_sitter v2.0+

In ruby_tree_sitter v2.0, all TreeSitter exceptions were changed to inherit from Exception (not StandardError). This was an intentional breaking change made for thread-safety and signal handling reasons.

Exception Mapping: TreeHaver catches TreeSitter::TreeSitterError and its subclasses, converting them to TreeHaver::NotAvailable while preserving the original error message. This provides a consistent exception API across all backends:

ruby_tree_sitter Exception TreeHaver Exception When It Occurs
TreeSitter::ParserNotFoundError TreeHaver::NotAvailable Parser library file cannot be loaded
TreeSitter::LanguageLoadError TreeHaver::NotAvailable Language symbol loads but returns nothing
TreeSitter::SymbolNotFoundError TreeHaver::NotAvailable Symbol not found in library
TreeSitter::ParserVersionError TreeHaver::NotAvailable Parser version incompatible with tree-sitter
TreeSitter::QueryCreationError TreeHaver::NotAvailable Query creation fails
# Add to your Gemfile for MRI backend
gem "ruby_tree_sitter", "~> 2.0"

Rust Backend (tree_stump)

Currently requires pboling's fork until upstream PRs are merged.

# Add to your Gemfile for Rust backend
gem "tree_stump", github: "pboling/tree_stump", branch: "tree_haver"

FFI Backend

Requires the ffi gem and a system installation of libtree-sitter:

# Add to your Gemfile for FFI backend
gem "ffi", ">= 1.15", "< 2.0"
# Install libtree-sitter on your system:
# macOS
brew install tree-sitter

# Ubuntu/Debian
apt-get install libtree-sitter0 libtree-sitter-dev

# Fedora
dnf install tree-sitter tree-sitter-devel

Citrus Backend

Pure Ruby parser with no native dependencies:

# Add to your Gemfile for Citrus backend
gem "citrus", "~> 3.0"

Java Backend (JRuby only)

No additional dependencies required beyond grammar JARs built for java-tree-sitter.

Why TreeHaver?

tree-sitter is a powerful parser generator that creates incremental parsers for many programming languages. However, integrating it into Ruby applications can be challenging:

  • MRI-based C extensions don't work on JRuby
  • FFI-based solutions may not be optimal for MRI
  • Managing different backends for different Ruby implementations is cumbersome

TreeHaver solves these problems by providing a unified API that automatically selects the appropriate backend for your Ruby implementation, allowing you to write code once and run it anywhere.

Comparison with Other Ruby AST / Parser Bindings

Feature tree_haver (this gem) ruby_tree_sitter tree_stump citrus
MRI Ruby ✅ Yes ✅ Yes ✅ Yes ✅ Yes
JRuby ✅ Yes (FFI, Java, or Citrus backend) ❌ No ❌ No ✅ Yes
TruffleRuby ✅ Yes (FFI or Citrus) ❌ No ❓ Unknown ✅ Yes
Backend Multi (MRI C, Rust, FFI, Java, Citrus) C extension only Rust extension Pure Ruby
Incremental Parsing ✅ Via MRI C/Rust/Java backend ✅ Yes ✅ Yes ❌ No
Query API ⚡ Via MRI/Rust/Java backend ✅ Yes ✅ Yes ❌ No
Grammar Discovery ✅ Built-in GrammarFinder ❌ Manual ❌ Manual ❌ Manual
Security Validations PathValidator ❌ No ❌ No ❌ No
Language Registration ✅ Thread-safe registry ❌ No ❌ No ❌ No
Native Performance ⚡ Backend-dependent ✅ Native C ✅ Native Rust ❌ Pure Ruby
Precompiled Binaries ⚡ Via Rust backend ✅ Yes ✅ Yes ✅ Pure Ruby
Zero Native Deps ⚡ Via Citrus backend ❌ No ❌ No ✅ Yes
Minimum Ruby 3.2+ 3.0+ 3.1+ 0+

Note: Java backend works with grammar JARs built specifically for java-tree-sitter, or grammar .so files that statically link tree-sitter. This is why FFI is recommended for JRuby & TruffleRuby.

Note: TreeHaver can use ruby_tree_sitter or tree_stump as backends, giving you TreeHaver's unified API, grammar discovery, and security features, plus full access to incremental parsing when using those backends.

Note: tree_stump currently requires pboling's fork (tree_haver branch) until upstream PRs #5, #7, #11, and #13 are merged.

When to Use Each

Choose TreeHaver when:

  • You need JRuby or TruffleRuby support
  • You're building a library that should work across Ruby implementations
  • You want automatic grammar discovery and security validations
  • You want flexibility to switch backends without code changes
  • You need incremental parsing with a unified API

Choose ruby_tree_sitter directly when:

  • You only target MRI Ruby
  • You need the full Query API without abstraction
  • You want the most battle-tested C bindings
  • You don't need TreeHaver's grammar discovery

Choose tree_stump directly when:

  • You only target MRI Ruby
  • You prefer Rust-based native extensions
  • You want precompiled binaries without system dependencies
  • You don't need TreeHaver's grammar discovery
  • Note: Use pboling's fork (tree_haver branch) until PRs #5, #7, #11, #13 are merged

Choose citrus directly when:

  • You need zero native dependencies (pure Ruby)
  • You're using a Citrus grammar (not tree-sitter grammars)
  • Performance is less critical than portability
  • You don't need TreeHaver's unified API

💡 Info you can shake a stick at

Tokens to Remember Gem name Gem namespace
Works with JRuby JRuby 10.0 Compat JRuby HEAD Compat
Works with Truffle Ruby Truffle Ruby 23.1 Compat Truffle Ruby 24.1 Compat
Works with MRI Ruby 3 Ruby 3.2 Compat Ruby 3.3 Compat Ruby 3.4 Compat Ruby HEAD Compat
Support & Community Join Me on Daily.dev's RubyFriends Live Chat on Discord Get help from me on Upwork Get help from me on Codementor
Source Source on GitLab.com Source on CodeBerg.org Source on Github.com The best SHA: dQw4w9WgXcQ!
Documentation Current release on RubyDoc.info YARD on Galtzo.com Maintainer Blog GitLab Wiki GitHub Wiki
Compliance License: MIT Compatible with Apache Software Projects: Verified by SkyWalking Eyes 📄ilo-declaration-img Security Policy Contributor Covenant 2.1 SemVer 2.0.0
Style Enforced Code Style Linter Keep-A-Changelog 1.0.0 Gitmoji Commits Compatibility appraised by: appraisal2
Maintainer 🎖️ Follow Me on LinkedIn Follow Me on Ruby.Social Follow Me on Bluesky Contact Maintainer My technical writing
... 💖 Find Me on WellFound: Find Me on CrunchBase My LinkTree More About Me 🧊 🐙 🛖 🧪

Compatibility

Compatible with MRI Ruby 3.2.0+, and concordant releases of JRuby, and TruffleRuby.

🚚 Amazing test matrix was brought to you by 🔎 appraisal2 🔎 and the color 💚 green 💚
👟 Check it out! github.com/appraisal-rb/appraisal2

Federated DVCS

Find this repo on federated forges (Coming soon!)
Federated DVCS Repository Status Issues PRs Wiki CI Discussions
🧪 kettle-rb/tree_haver on GitLab The Truth 💚 💚 💚 🐭 Tiny Matrix
🧊 kettle-rb/tree_haver on CodeBerg An Ethical Mirror (Donate) 💚 💚 ⭕️ No Matrix
🐙 kettle-rb/tree_haver on GitHub Another Mirror 💚 💚 💚 💯 Full Matrix 💚
🎮️ Discord Server Live Chat on Discord Let's talk about this library!

Enterprise Support Tidelift

Available as part of the Tidelift Subscription.

Need enterprise-level guarantees?

The maintainers of this and thousands of other packages are working with Tidelift to deliver commercial support and maintenance for the open source packages you use to build your applications. Save time, reduce risk, and improve code health, while paying the maintainers of the exact packages you use.

Get help from me on Tidelift

  • 💡Subscribe for support guarantees covering all your FLOSS dependencies
  • 💡Tidelift is part of Sonar
  • 💡Tidelift pays maintainers to maintain the software you depend on!
    📊@Pointy Haired Boss: An enterprise support subscription is "never gonna let you down", and supports open source maintainers

Alternatively:

  • Live Chat on Discord
  • Get help from me on Upwork
  • Get help from me on Codementor

✨ Installation

Install the gem and add to the application's Gemfile by executing:

bundle add tree_haver

If bundler is not being used to manage dependencies, install the gem by executing:

gem install tree_haver

🔒 Secure Installation

For Medium or High Security Installations

This gem is cryptographically signed, and has verifiable SHA-256 and SHA-512 checksums by stone_checksums. Be sure the gem you install hasn’t been tampered with by following the instructions below.

Add my public key (if you haven’t already, expires 2045-04-29) as a trusted certificate:

gem cert --add <(curl -Ls https://raw.github.com/galtzo-floss/certs/main/pboling.pem)

You only need to do that once. Then proceed to install with:

gem install tree_haver -P HighSecurity

The HighSecurity trust profile will verify signed gems, and not allow the installation of unsigned dependencies.

If you want to up your security game full-time:

bundle config set --global trust-policy MediumSecurity

MediumSecurity instead of HighSecurity is necessary if not all the gems you use are signed.

NOTE: Be prepared to track down certs for signed gems and add them the same way you added mine.

⚙️ Configuration

Available Backends

TreeHaver supports multiple parsing backends, each with different trade-offs. The auto backend automatically selects the best available option.

Backend Description Performance Portability Examples
Auto Auto-selects best backend Varies ✅ Universal JSON · JSONC · Bash
MRI C extension via ruby_tree_sitter ⚡ Fastest MRI only JSON · JSONC · Bash*
Rust Precompiled via tree_stump ⚡ Very Fast ✅ Good JSON · JSONC · Bash*
FFI Dynamic linking via FFI 🔵 Fast ✅ Universal JSON · JSONC · Bash
Java JNI bindings ⚡ Very Fast JRuby only JSON · JSONC · Bash
Citrus Pure Ruby parsing 🟡 Slower ✅ Universal TOML · Finitio · Dhall

Selection Priority (Auto mode): MRI → Rust → FFI → Java → Citrus

Known Issues:

  • *MRI + Bash: ABI incompatibility (use FFI instead)
  • *Rust + Bash: Version mismatch (use FFI instead)

Backend Requirements:

# MRI Backend
gem 'ruby_tree_sitter'

# Rust Backend  
gem 'tree_stump'

# FFI Backend
gem 'ffi'

# Citrus Backend
gem 'citrus'
# Plus grammar gems: toml-rb, dhall, finitio, etc.

Force Specific Backend:

TreeHaver.backend = :ffi    # Force FFI backend
TreeHaver.backend = :mri    # Force MRI backend
TreeHaver.backend = :rust   # Force Rust backend
TreeHaver.backend = :java   # Force Java backend (JRuby)
TreeHaver.backend = :citrus # Force Citrus backend
TreeHaver.backend = :auto   # Auto-select (default)

Block-based Backend Switching:

Use with_backend to temporarily switch backends for a specific block of code. This is thread-safe and supports nesting—the previous backend is automatically restored when the block exits (even if an exception is raised).

# Temporarily use a specific backend
TreeHaver.with_backend(:mri) do
  parser = TreeHaver::Parser.new
  tree = parser.parse(source)
  # All operations in this block use the MRI backend
end
# Backend is restored to its previous value here

# Nested blocks work correctly
TreeHaver.with_backend(:rust) do
  # Uses :rust
  TreeHaver.with_backend(:citrus) do
    # Uses :citrus
    parser = TreeHaver::Parser.new
  end
  # Back to :rust
end
# Back to original backend

This is particularly useful for:

  • Testing: Test the same code with different backends
  • Performance comparison: Benchmark different backends
  • Fallback scenarios: Try one backend, fall back to another
  • Thread isolation: Each thread can use a different backend safely
# Example: Testing with multiple backends
[:mri, :rust, :citrus].each do |backend_name|
  TreeHaver.with_backend(backend_name) do
    parser = TreeHaver::Parser.new
    result = parser.parse(source)
    puts "#{backend_name}: #{result.root_node.type}"
  end
end

Check Backend Capabilities:

TreeHaver.backend              # => :ffi
TreeHaver.backend_module       # => TreeHaver::Backends::FFI
TreeHaver.capabilities         # => { backend: :ffi, parse: true, query: false, ... }

See examples/ directory for 18 complete working examples demonstrating all backends and languages.

Security Considerations

⚠️ Loading shared libraries (.so/.dylib/.dll) executes arbitrary native code.

TreeHaver provides defense-in-depth validations, but you should understand the risks:

Attack Vectors Mitigated

TreeHaver's PathValidator module protects against:

  • Path traversal: Paths containing /../ or /./ are rejected
  • Null byte injection: Paths containing null bytes are rejected
  • Non-absolute paths: Relative paths are rejected to prevent CWD-based attacks
  • Invalid extensions: Only .so, .dylib, and .dll files are accepted
  • Malicious filenames: Filenames must match a safe pattern (alphanumeric, hyphens, underscores)
  • Invalid language names: Language names must be lowercase alphanumeric with underscores
  • Invalid symbol names: Symbol names must be valid C identifiers

Secure Usage

# Standard usage - paths from ENV are validated
finder = TreeHaver::GrammarFinder.new(:toml)
path = finder.find_library_path  # Validates ENV path before returning

# Maximum security - only trusted system directories
path = finder.find_library_path_safe  # Ignores ENV, only /usr/lib etc.

# Manual validation
if TreeHaver::PathValidator.safe_library_path?(user_provided_path)
  language = TreeHaver::Language.from_library(user_provided_path)
end

# Get validation errors for debugging
errors = TreeHaver::PathValidator.validation_errors(path)
# => ["Path is not absolute", "Path contains traversal sequence"]

Trusted Directories

The find_library_path_safe method only returns paths in trusted directories.

Default trusted directories:

  • /usr/lib, /usr/lib64
  • /usr/lib/x86_64-linux-gnu, /usr/lib/aarch64-linux-gnu
  • /usr/local/lib
  • /opt/homebrew/lib, /opt/local/lib

Adding custom trusted directories:

For non-standard installations (Homebrew on Linux, luarocks, mise, asdf, etc.), register additional trusted directories:

# Programmatically at application startup
TreeHaver::PathValidator.add_trusted_directory("/home/linuxbrew/.linuxbrew/Cellar")
TreeHaver::PathValidator.add_trusted_directory("~/.local/share/mise/installs/lua")

# Or via environment variable (comma-separated, in your shell profile)
export TREE_HAVER_TRUSTED_DIRS = "/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"

Example: Fedora Silverblue with Homebrew and luarocks

# In ~/.bashrc or ~/.zshrc
export TREE_HAVER_TRUSTED_DIRS="/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"

# tree-sitter runtime library
export TREE_SITTER_RUNTIME_LIB=/home/linuxbrew/.linuxbrew/Cellar/tree-sitter/0.26.3/lib/libtree-sitter.so

# Language grammar (luarocks-installed)
export TREE_SITTER_TOML_PATH=~/.local/share/mise/installs/lua/5.4.8/luarocks/lib/luarocks/rocks-5.4/tree-sitter-toml/0.0.31-1/parser/toml.so

Recommendations

  1. Production: Consider using find_library_path_safe to ignore ENV overrides
  2. Development: Standard find_library_path is convenient for testing
  3. User Input: Always validate paths before passing to Language.from_library
  4. CI/CD: Be cautious of ENV vars that could be set by untrusted sources
  5. Custom installs: Register trusted directories via TREE_HAVER_TRUSTED_DIRS or add_trusted_directory

Backend Selection

TreeHaver automatically selects the best backend for your Ruby implementation, but you can override this behavior:

# Automatic backend selection (default)
TreeHaver.backend = :auto

# Force a specific backend
TreeHaver.backend = :mri     # Use ruby_tree_sitter (MRI only, C extension)
TreeHaver.backend = :rust    # Use tree_stump (MRI, Rust extension with precompiled binaries)
                             # Note: Requires pboling's fork until PRs #5, #7, #11, #13 are merged
                             # See: https://github.com/pboling/tree_stump/tree/tree_haver
TreeHaver.backend = :ffi     # Use FFI bindings (works on MRI and JRuby)
TreeHaver.backend = :java    # Use Java bindings (JRuby only, coming soon)
TreeHaver.backend = :citrus  # Use Citrus pure Ruby parser
                             # NOTE: Portable, all Ruby implementations
                             # CAVEAT: few major language grammars, but many esoteric grammars

Auto-selection priority on MRI: MRI → Rust → FFI → Citrus

You can also set the backend via environment variable:

export TREE_HAVER_BACKEND=rust

Environment Variables

TreeHaver recognizes several environment variables for configuration:

Note: All path-based environment variables are validated before use. Invalid paths are ignored.

Security Configuration

  • TREE_HAVER_TRUSTED_DIRS: Comma-separated list of additional trusted directories for grammar libraries

    # For Homebrew on Linux and luarocks
    export TREE_HAVER_TRUSTED_DIRS="/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"

    Tilde (~) is expanded to the user's home directory. Directories listed here are considered safe for find_library_path_safe.

Core Runtime Library

  • TREE_SITTER_RUNTIME_LIB: Absolute path to the core libtree-sitter shared library
    export TREE_SITTER_RUNTIME_LIB=/usr/local/lib/libtree-sitter.so

If not set, TreeHaver tries these names in order:

  • tree-sitter
  • libtree-sitter.so.0
  • libtree-sitter.so
  • libtree-sitter.dylib
  • libtree-sitter.dll

Language Symbol Resolution

When loading a language grammar, if you don't specify the symbol: parameter, TreeHaver resolves it in this precedence:

  1. TREE_SITTER_LANG_SYMBOL: Explicit symbol override
  2. Guessed from filename (e.g., libtree-sitter-toml.sotree_sitter_toml)
  3. Default fallback (tree_sitter_toml)
export TREE_SITTER_LANG_SYMBOL=tree_sitter_toml

Language Library Paths

For specific languages, you can set environment variables to point to grammar libraries:

export TREE_SITTER_TOML_PATH=/usr/local/lib/libtree-sitter-toml.so
export TREE_SITTER_JSON_PATH=/usr/local/lib/libtree-sitter-json.so

JRuby-Specific: Java Backend JARs

For the Java backend on JRuby:

export TREE_SITTER_JAVA_JARS_DIR=/path/to/java-tree-sitter/jars

Language Registration

Register languages once at application startup for convenient access:

# Register a TOML grammar
TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml",  # optional, will be inferred if omitted
)

# Now you can use the convenient helper
language = TreeHaver::Language.toml

# Or still override path/symbol per-call
language = TreeHaver::Language.toml(
  path: "/custom/path/libtree-sitter-toml.so",
)

Grammar Discovery with GrammarFinder

For libraries that need to automatically locate tree-sitter grammars (like the *-merge family of gems), TreeHaver provides the GrammarFinder utility class. It handles platform-aware grammar discovery without requiring language-specific code in TreeHaver itself.

# Create a finder for any language
finder = TreeHaver::GrammarFinder.new(:toml)

# Check if the grammar is available
if finder.available?
  puts "TOML grammar found at: #{finder.find_library_path}"
else
  puts finder.not_found_message
  # => "tree-sitter toml grammar not found. Searched: /usr/lib/libtree-sitter-toml.so, ..."
end

# Register the language if available
finder.register! if finder.available?

# Now use the registered language
language = TreeHaver::Language.toml

GrammarFinder Automatic Derivation

Given just the language name, GrammarFinder automatically derives:

Property Derived Value (for :toml)
ENV var TREE_SITTER_TOML_PATH
Library filename libtree-sitter-toml.so (Linux) or .dylib (macOS)
Symbol name tree_sitter_toml

Search Order

GrammarFinder searches for grammars in this order:

  1. Environment variable: TREE_SITTER_<LANG>_PATH (highest priority)
  2. Extra paths: Custom paths provided at initialization
  3. System paths: Common installation directories (/usr/lib, /usr/local/lib, /opt/homebrew/lib, etc.)

Usage in *-merge Gems

The GrammarFinder pattern enables clean integration in language-specific merge gems:

# In toml-merge
finder = TreeHaver::GrammarFinder.new(:toml)
finder.register! if finder.available?

# In json-merge
finder = TreeHaver::GrammarFinder.new(:json)
finder.register! if finder.available?

# In bash-merge
finder = TreeHaver::GrammarFinder.new(:bash)
finder.register! if finder.available?

Each gem uses the same API—only the language name changes.

Adding Custom Search Paths

For non-standard installations, provide extra search paths:

finder = TreeHaver::GrammarFinder.new(:toml, extra_paths: [
  "/opt/custom/lib",
  "/home/user/.local/lib",
])

Debug Information

Get detailed information about the grammar search:

finder = TreeHaver::GrammarFinder.new(:toml)
puts finder.search_info
# => {
#      language: :toml,
#      env_var: "TREE_SITTER_TOML_PATH",
#      env_value: nil,
#      symbol: "tree_sitter_toml",
#      library_filename: "libtree-sitter-toml.so",
#      search_paths: ["/usr/lib/libtree-sitter-toml.so", ...],
#      found_path: "/usr/lib/libtree-sitter-toml.so",
#      available: true
#    }

Checking Capabilities

Different backends may support different features:

TreeHaver.capabilities
# => { backend: :mri, query: true, bytes_field: true }
# or
# => { backend: :ffi, parse: true, query: false, bytes_field: true }
# or
# => { backend: :citrus, parse: true, query: false, bytes_field: false }

Compatibility Mode

For codebases migrating from ruby_tree_sitter, TreeHaver provides a compatibility shim:

require "tree_haver/compat"

# Now TreeSitter constants map to TreeHaver
parser = TreeSitter::Parser.new  # Actually creates TreeHaver::Parser

This is safe and idempotent—if the real TreeSitter module is already loaded, the shim does nothing.

⚠️ Critical: Exception Hierarchy Incompatibility

ruby_tree_sitter v2+ exceptions inherit from Exception (not StandardError).
TreeHaver exceptions follow Ruby best practices and inherit from StandardError.

This means exception handling behaves differently between the two:

Scenario ruby_tree_sitter v2+ TreeHaver Compat Mode
rescue => e Does NOT catch TreeSitter errors DOES catch TreeHaver errors
Behavior Errors propagate (inherit Exception) Errors caught (inherit StandardError)

Example showing the difference:

# With real ruby_tree_sitter v2+
begin
  TreeSitter::Language.load("missing", "/nonexistent.so")
rescue => e
  puts "Caught!"  # Never reached - TreeSitter errors inherit Exception
end

# With TreeHaver compat mode
require "tree_haver/compat"
begin
  TreeSitter::Language.load("missing", "/nonexistent.so")  # Actually TreeHaver
rescue => e
  puts "Caught!"  # WILL be reached - TreeHaver errors inherit StandardError
end

To write compatible exception handling:

# Option 1: Catch specific exception (works with both)
begin
  TreeSitter::Language.load(...)
rescue TreeSitter::TreeSitterError => e  # Explicit rescue
  # Works with both ruby_tree_sitter and TreeHaver compat mode
end

# Option 2: Use TreeHaver API directly (recommended)
begin
  TreeHaver::Language.from_library(...)
rescue TreeHaver::NotAvailable => e  # TreeHaver's unified exception
  # Clear and consistent when using TreeHaver
end

Why TreeHaver uses StandardError:

  1. Ruby Best Practice: The Ruby style guide recommends inheriting from StandardError
  2. Safety: Inheriting from Exception can catch system signals (SIGTERM, SIGINT) and exit, which is dangerous
  3. Consistency: Most Ruby libraries follow this convention
  4. Testability: StandardError exceptions are easier to test and mock

See lib/tree_haver/compat.rb for detailed documentation.

🔧 Basic Usage

Quick Start

Here's a complete example of parsing TOML with TreeHaver:

require "tree_haver"

# Load a language grammar
language = TreeHaver::Language.from_library(
  "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml",
)

# Create a parser
parser = TreeHaver::Parser.new
parser.language = language

# Parse some source code
source = <<~TOML
  [package]
  name = "my-app"
  version = "1.0.0"
TOML

tree = parser.parse(source)

# Access the root node
root = tree.root_node
puts "Root node type: #{root.type}"  # => "document"

# Traverse the tree
root.each do |child|
  puts "Child type: #{child.type}"
  child.each do |grandchild|
    puts "  Grandchild type: #{grandchild.type}"
  end
end

Using Language Registration

For cleaner code, register languages at startup:

# At application initialization
TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
)

TreeHaver.register_language(
  :json,
  path: "/usr/local/lib/libtree-sitter-json.so",
)

# Later in your code
toml_language = TreeHaver::Language.toml
json_language = TreeHaver::Language.json

parser = TreeHaver::Parser.new
parser.language = toml_language
tree = parser.parse(toml_source)

Flexible Language Names

The name parameter in register_language is an arbitrary identifier you choose—it doesn't need to match the actual language name. The actual grammar identity comes from the path and symbol parameters (for tree-sitter) or grammar_module (for Citrus).

This flexibility is useful for:

  • Aliasing: Register the same grammar under multiple names
  • Versioning: Register different grammar versions (e.g., :ruby_2, :ruby_3)
  • Testing: Use unique names to avoid collisions between tests
  • Context-specific naming: Use names that make sense for your application
# Register the same TOML grammar under different names for different purposes
TreeHaver.register_language(
  :config_parser,  # Custom name for your app
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml",
)

TreeHaver.register_language(
  :toml_v1,  # Version-specific name
  path: "/usr/local/lib/libtree-sitter-toml.so",
  symbol: "tree_sitter_toml",
)

# Use your custom names
config_lang = TreeHaver::Language.config_parser
versioned_lang = TreeHaver::Language.toml_v1

Parsing Different Languages

TreeHaver works with any tree-sitter grammar:

# Parse Ruby code
ruby_lang = TreeHaver::Language.from_library(
  "/path/to/libtree-sitter-ruby.so",
)
parser = TreeHaver::Parser.new
parser.language = ruby_lang
tree = parser.parse("class Foo; end")

# Parse JavaScript
js_lang = TreeHaver::Language.from_library(
  "/path/to/libtree-sitter-javascript.so",
)
parser.language = js_lang  # Reuse the same parser
tree = parser.parse("const x = 42;")

Walking the AST

TreeHaver provides simple node traversal:

tree = parser.parse(source)
root = tree.root_node

# Recursive tree walk
def walk_tree(node, depth = 0)
  puts "#{"  " * depth}#{node.type}"
  node.each { |child| walk_tree(child, depth + 1) }
end

walk_tree(root)

Incremental Parsing

TreeHaver supports incremental parsing when using the MRI or Rust backends. This is a major performance optimization for editors and IDEs that need to re-parse on every keystroke.

# Check if current backend supports incremental parsing
if TreeHaver.capabilities[:incremental]
  puts "Incremental parsing is available!"
end

# Initial parse
parser = TreeHaver::Parser.new
parser.language = language
tree = parser.parse_string(nil, "x = 1")

# User edits the source: "x = 1" -> "x = 42"
# Mark the tree as edited (tell tree-sitter what changed)
tree.edit(
  start_byte: 4,           # edit starts at byte 4
  old_end_byte: 5,         # old text "1" ended at byte 5
  new_end_byte: 6,         # new text "42" ends at byte 6
  start_point: {row: 0, column: 4},
  old_end_point: {row: 0, column: 5},
  new_end_point: {row: 0, column: 6},
)

# Re-parse incrementally - tree-sitter reuses unchanged nodes
new_tree = parser.parse_string(tree, "x = 42")

Note: Incremental parsing requires the MRI (ruby_tree_sitter), Rust (tree_stump), or Java (java-tree-sitter) backend. The FFI and Citrus backends do not currently support incremental parsing. You can check support with:

Note: tree_stump requires pboling's fork (tree_haver branch) until PRs #5, #7, #11, #13 are merged.

tree.supports_editing?  # => true if edit() is available

Error Handling

begin
  language = TreeHaver::Language.from_library("/path/to/grammar.so")
rescue TreeHaver::NotAvailable => e
  puts "Failed to load grammar: #{e.message}"
end

# Check if a backend is available
if TreeHaver.backend_module.nil?
  puts "No TreeHaver backend is available!"
  puts "Install ruby_tree_sitter (MRI), ffi gem with libtree-sitter, or citrus gem"
end

Platform-Specific Examples

MRI Ruby

On MRI, TreeHaver uses ruby_tree_sitter by default:

# Gemfile
gem "tree_haver"
gem "ruby_tree_sitter"  # MRI backend

# Code - no changes needed, TreeHaver auto-selects MRI backend
parser = TreeHaver::Parser.new

JRuby

On JRuby, TreeHaver can use the FFI backend, Java backend, or Citrus backend:

Option 1: FFI Backend (recommended for tree-sitter grammars)

# Gemfile
gem "tree_haver"
gem "ffi"  # Required for FFI backend

# Ensure libtree-sitter is installed on your system
# On macOS with Homebrew:
#   brew install tree-sitter

# On Ubuntu/Debian:
#   sudo apt-get install libtree-sitter0 libtree-sitter-dev

# Code - TreeHaver auto-selects FFI backend on JRuby
parser = TreeHaver::Parser.new

Option 2: Java Backend (native JVM performance)

# 1. Download java-tree-sitter JAR from Maven Central
mkdir -p vendor/jars
curl -fSL -o vendor/jars/jtreesitter-0.23.2.jar \
  "https://repo1.maven.org/maven2/io/github/tree-sitter/jtreesitter/0.23.2/jtreesitter-0.23.2.jar"

# 2. Set environment variables
export CLASSPATH="$(pwd)/vendor/jars:$CLASSPATH"
export LD_LIBRARY_PATH="/path/to/libtree-sitter/lib:$LD_LIBRARY_PATH"

# 3. Run with JRuby (requires Java 22+ for Foreign Function API)
JAVA_OPTS="--enable-native-access=ALL-UNNAMED" jruby your_script.rb
# Force Java backend
TreeHaver.backend = :java

# Check if Java backend is available
if TreeHaver::Backends::Java.available?
  puts "Java backend is ready!"
  puts TreeHaver.capabilities
  # => { backend: :java, parse: true, query: true, bytes_field: true, incremental: true }
end

⚠️ Java Backend Limitation: Symbol Resolution

The Java backend uses Java's Foreign Function & Memory (FFM) API which loads libraries in isolation. Unlike the system's dynamic linker (dlopen), FFM's SymbolLookup.or() chains symbol lookups but doesn't resolve dynamic library dependencies.

This means grammar .so files with unresolved references to libtree-sitter.so symbols won't load correctly. Most grammars from luarocks, npm, or other sources have these dependencies.

Recommended approach for JRuby: Use the FFI backend:

# On JRuby, use FFI backend (recommended)
TreeHaver.backend = :ffi

The FFI backend uses Ruby's FFI gem which relies on the system's dynamic linker, correctly resolving symbol dependencies between libtree-sitter.so and grammar libraries.

The Java backend will work with:

  • Grammar JARs built specifically for java-tree-sitter (self-contained)
  • Grammar .so files that statically link tree-sitter

Option 3: Citrus Backend (pure Ruby, portable)

# Gemfile
gem "tree_haver"
gem "citrus"  # Pure Ruby parser, zero native dependencies

# Code - Force Citrus backend for maximum portability
TreeHaver.backend = :citrus

# Check if Citrus backend is available
if TreeHaver::Backends::Citrus.available?
  puts "Citrus backend is ready!"
  puts TreeHaver.capabilities
  # => { backend: :citrus, parse: true, query: false, bytes_field: false }
end

⚠️ Citrus Backend Limitations:

  • Uses Citrus grammars (not tree-sitter grammars)
  • No incremental parsing support
  • No query API
  • Pure Ruby performance (slower than native backends)
  • Best for: prototyping, environments without native extension support, teaching

TruffleRuby

TruffleRuby can use the MRI, FFI, or Citrus backend:

# Use FFI backend (recommended for tree-sitter grammars)
TreeHaver.backend = :ffi

# Or try MRI backend if ruby_tree_sitter compiles on your TruffleRuby version
TreeHaver.backend = :mri

# Or use Citrus backend for zero native dependencies
TreeHaver.backend = :citrus

Advanced: Thread-Safe Backend Switching

TreeHaver provides with_backend for thread-safe, temporary backend switching. This is essential for testing, benchmarking, and applications that need different backends in different contexts.

Testing with Multiple Backends

Test the same code path with different backends using with_backend:

# In your test setup
RSpec.describe("MyParser") do
  # Test with each available backend
  [:mri, :rust, :citrus].each do |backend_name|
    context "with #{backend_name} backend" do
      it "parses correctly" do
        TreeHaver.with_backend(backend_name) do
          parser = TreeHaver::Parser.new
          result = parser.parse("x = 42")
          expect(result.root_node.type).to eq("document")
        end
        # Backend automatically restored after block
      end
    end
  end
end

Thread Isolation

Each thread can use a different backend safely—with_backend uses thread-local storage:

threads = []

threads << Thread.new do
  TreeHaver.with_backend(:mri) do
    # This thread uses MRI backend
    parser = TreeHaver::Parser.new
    100.times { parser.parse("x = 1") }
  end
end

threads << Thread.new do
  TreeHaver.with_backend(:citrus) do
    # This thread uses Citrus backend simultaneously
    parser = TreeHaver::Parser.new
    100.times { parser.parse("x = 1") }
  end
end

threads.each(&:join)

Nested Blocks

with_backend supports nesting—inner blocks override outer blocks:

TreeHaver.with_backend(:rust) do
  puts TreeHaver.effective_backend  # => :rust

  TreeHaver.with_backend(:citrus) do
    puts TreeHaver.effective_backend  # => :citrus
  end

  puts TreeHaver.effective_backend  # => :rust (restored)
end

Fallback Pattern

Try one backend, fall back to another on failure:

def parse_with_fallback(source)
  TreeHaver.with_backend(:mri) do
    TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
  end
rescue TreeHaver::NotAvailable
  # Fall back to Citrus if MRI backend unavailable
  TreeHaver.with_backend(:citrus) do
    TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
  end
end

Complete Real-World Example

Here's a practical example that extracts package names from a TOML file:

require "tree_haver"

# Setup
TreeHaver.register_language(
  :toml,
  path: "/usr/local/lib/libtree-sitter-toml.so",
)

def extract_package_name(toml_content)
  # Create parser
  parser = TreeHaver::Parser.new
  parser.language = TreeHaver::Language.toml

  # Parse
  tree = parser.parse(toml_content)
  root = tree.root_node

  # Find [package] table
  root.each do |child|
    next unless child.type == "table"

    child.each do |table_elem|
      if table_elem.type == "pair"
        # Look for name = "..." pair
        key = table_elem.each.first&.type
        # In a real implementation, you'd extract the text value
        # This is simplified for demonstration
      end
    end
  end
end

# Usage
toml = <<~TOML
  [package]
  name = "awesome-app"
  version = "2.0.0"
TOML

package_name = extract_package_name(toml)

🦷 FLOSS Funding

While kettle-rb tools are free software and will always be, the project would benefit immensely from some funding. Raising a monthly budget of... "dollars" would make the project more sustainable.

We welcome both individual and corporate sponsors! We also offer a wide array of funding channels to account for your preferences (although currently Open Collective is our preferred funding platform).

If you're working in a company that's making significant use of kettle-rb tools we'd appreciate it if you suggest to your company to become a kettle-rb sponsor.

You can support the development of kettle-rb tools via GitHub Sponsors, Liberapay, PayPal, Open Collective and Tidelift.

📍 NOTE
If doing a sponsorship in the form of donation is problematic for your company
from an accounting standpoint, we'd recommend the use of Tidelift,
where you can get a support-like subscription instead.

Open Collective for Individuals

Support us with a monthly donation and help us continue our activities. [Become a backer]

NOTE: kettle-readme-backers updates this list every day, automatically.

No backers yet. Be the first!

Open Collective for Organizations

Become a sponsor and get your logo on our README on GitHub with a link to your site. [Become a sponsor]

NOTE: kettle-readme-backers updates this list every day, automatically.

No sponsors yet. Be the first!

Another way to support open-source

I’m driven by a passion to foster a thriving open-source community – a space where people can tackle complex problems, no matter how small. Revitalizing libraries that have fallen into disrepair, and building new libraries focused on solving real-world challenges, are my passions. I was recently affected by layoffs, and the tech jobs market is unwelcoming. I’m reaching out here because your support would significantly aid my efforts to provide for my family, and my farm (11 🐔 chickens, 2 🐶 dogs, 3 🐰 rabbits, 8 🐈‍ cats).

If you work at a company that uses my work, please encourage them to support me as a corporate sponsor. My work on gems you use might show up in bundle fund.

I’m developing a new library, floss_funding, designed to empower open-source developers like myself to get paid for the work we do, in a sustainable way. Please give it a look.

Floss-Funding.dev: 👉️ No network calls. 👉️ No tracking. 👉️ No oversight. 👉️ Minimal crypto hashing. 💡 Easily disabled nags

OpenCollective Backers OpenCollective Sponsors Sponsor Me on Github Liberapay Goal Progress Donate on PayPal Buy me a coffee Donate on Polar Donate to my FLOSS efforts at ko-fi.com Donate to my FLOSS efforts using Patreon

🔐 Security

See SECURITY.md.

🤝 Contributing

If you need some ideas of where to help, you could work on adding more code coverage, or if it is already 💯 (see below) check reek, issues, or PRs, or use the gem and think about how it could be better.

We Keep A Changelog so if you make changes, remember to update it.

See CONTRIBUTING.md for more detailed instructions.

🚀 Release Instructions

See CONTRIBUTING.md.

Code Coverage

Coverage Graph

Coveralls Test Coverage

QLTY Test Coverage

🪇 Code of Conduct

Everyone interacting with this project's codebases, issue trackers, chat rooms and mailing lists agrees to follow the Contributor Covenant 2.1.

🌈 Contributors

Contributors

Made with contributors-img.

Also see GitLab Contributors: https://gitlab.com/kettle-rb/tree_haver/-/graphs/main

⭐️ Star History Star History Chart

📌 Versioning

This Library adheres to Semantic Versioning 2.0.0. Violations of this scheme should be reported as bugs. Specifically, if a minor or patch version is released that breaks backward compatibility, a new version should be immediately released that restores compatibility. Breaking changes to the public API will only be introduced with new major versions.

dropping support for a platform is both obviously and objectively a breaking change
—Jordan Harband (@ljharb, maintainer of SemVer) in SemVer issue 716

I understand that policy doesn't work universally ("exceptions to every rule!"), but it is the policy here. As such, in many cases it is good to specify a dependency on this library using the Pessimistic Version Constraint with two digits of precision.

For example:

spec.add_dependency("tree_haver", "~> 1.0")
📌 Is "Platform Support" part of the public API? More details inside.

SemVer should, IMO, but doesn't explicitly, say that dropping support for specific Platforms is a breaking change to an API, and for that reason the bike shedding is endless.

To get a better understanding of how SemVer is intended to work over a project's lifetime, read this article from the creator of SemVer:

See CHANGELOG.md for a list of releases.

📄 License

The gem is available as open source under the terms of the MIT License License: MIT. See LICENSE.txt for the official Copyright Notice.

© Copyright

  • Copyright (c) 2025 Peter H. Boling, of Galtzo.com Galtzo.com Logo (Wordless) by Aboling0, CC BY-SA 4.0 , and tree_haver contributors.

🤑 A request for help

Maintainers have teeth and need to pay their dentists. After getting laid off in an RIF in March, and encountering difficulty finding a new one, I began spending most of my time building open source tools. I'm hoping to be able to pay for my kids' health insurance this month, so if you value the work I am doing, I need your support. Please consider sponsoring me or the project.

To join the community or get help 👇️ Join the Discord.

Live Chat on Discord

To say "thanks!" ☝️ Join the Discord or 👇️ send money.

Sponsor kettle-rb/tree_haver on Open Source Collective 💌 Sponsor me on GitHub Sponsors 💌 Sponsor me on Liberapay 💌 Donate on PayPal

Please give the project a star ⭐ ♥.

Thanks for RTFM. ☺️