| 📍 NOTE |
|---|
| RubyGems (the GitHub org, not the website) suffered a hostile takeover in September 2025. |
| Ultimately 4 maintainers were hard removed and a reason has been given for only 1 of those, while 2 others resigned in protest. |
| It is a complicated story which is difficult to parse quickly. |
| Simply put - there was active policy for adding or removing maintainers/owners of rubygems and bundler, and those policies were not followed. |
| I'm adding notes like this to gems because I don't condone theft of repositories or gems from their rightful owners. |
| If a similar theft happened with my repos/gems, I'd hope some would stand up for me. |
| Disenfranchised former-maintainers have started gem.coop. |
| Once available I will publish there exclusively; unless RubyCentral makes amends with the community. |
| The "Technology for Humans: Joel Draper" podcast episode by reinteractive is the most cogent summary I'm aware of. |
| See here, here and here for more info on what comes next. |
| What I'm doing: A (WIP) proposal for bundler/gem scopes, and a (WIP) proposal for a federated gem server. |
🌴 TreeHaver
if ci_badges.map(&:color).detect { it != "green"} ☝️ let me know, as I may have missed the discord notification.
if ci_badges.map(&:color).all? { it == "green"} 👇️ send money so I can do more of this. FLOSS maintenance is now my full-time job.
🌻 Synopsis
TreeHaver is a cross-Ruby adapter for the tree-sitter parsing library that works seamlessly across MRI Ruby, JRuby, and TruffleRuby. It provides a unified API for parsing source code using tree-sitter grammars, regardless of your Ruby implementation.
The Adapter Pattern: Like Faraday, but for Parsing
If you've used Faraday, multi_json, or multi_xml, you'll feel right at home with TreeHaver. These gems share a common philosophy:
| Gem | Unified API for | Backend Examples |
|---|---|---|
| Faraday | HTTP requests | Net::HTTP, Typhoeus, Patron, Excon |
| multi_json | JSON parsing | Oj, Yajl, JSON gem |
| multi_xml | XML parsing | Nokogiri, LibXML, Ox |
| TreeHaver | tree-sitter parsing | ruby_tree_sitter, tree_stump, FFI, Java JARs, Citrus |
Write once, run anywhere.
Learn once, write anywhere.
Just as Faraday lets you swap HTTP adapters without changing your code, TreeHaver lets you swap tree-sitter backends. Your parsing code remains the same whether you're running on MRI with native C extensions, JRuby with FFI, or TruffleRuby.
# Your code stays the same regardless of backend
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Language.from_library("/path/to/grammar.so")
tree = parser.parse(source_code)
# TreeHaver automatically picks the best backend:
# - MRI → ruby_tree_sitter (C extensions)
# - JRuby → FFI (system's libtree-sitter)
# - TruffleRuby → FFI or MRI backendKey Features
- Universal Ruby Support: Works on MRI Ruby, JRuby, and TruffleRuby
-
Multiple Backends:
-
MRI Backend: Leverages the excellent
ruby_tree_sittergem (C extension) -
Rust Backend: Uses
tree_stumpgem (Rust extension with precompiled binaries)- Note: Currently requires pboling's fork until PRs #5, #7, #11, and #13 (inclusive of the others) are merged
-
FFI Backend: Pure Ruby FFI bindings to
libtree-sitter(ideal for JRuby) - Java Backend: Support for JRuby's native Java integration, and native java-tree-sitter grammar JARs
-
Citrus Backend: Pure Ruby parser using
citrusgem (no native dependencies, portable)
-
MRI Backend: Leverages the excellent
- Automatic Backend Selection: Intelligently selects the best backend for your Ruby implementation
- Language Agnostic: Load any tree-sitter grammar dynamically (TOML, JSON, Ruby, JavaScript, etc.)
-
Grammar Discovery: Built-in
GrammarFinderutility for platform-aware grammar library discovery - Thread-Safe: Built-in language registry with thread-safe caching
- Minimal API Surface: Simple, focused API that covers the most common tree-sitter use cases
Backend Requirements
TreeHaver has minimal dependencies and automatically selects the best backend for your Ruby implementation. Each backend has specific version requirements:
MRI Backend (ruby_tree_sitter, C extensions)
Requires ruby_tree_sitter v2.0+
In ruby_tree_sitter v2.0, all TreeSitter exceptions were changed to inherit from Exception (not StandardError). This was an intentional breaking change made for thread-safety and signal handling reasons.
Exception Mapping: TreeHaver catches TreeSitter::TreeSitterError and its subclasses, converting them to TreeHaver::NotAvailable while preserving the original error message. This provides a consistent exception API across all backends:
| ruby_tree_sitter Exception | TreeHaver Exception | When It Occurs |
|---|---|---|
TreeSitter::ParserNotFoundError |
TreeHaver::NotAvailable |
Parser library file cannot be loaded |
TreeSitter::LanguageLoadError |
TreeHaver::NotAvailable |
Language symbol loads but returns nothing |
TreeSitter::SymbolNotFoundError |
TreeHaver::NotAvailable |
Symbol not found in library |
TreeSitter::ParserVersionError |
TreeHaver::NotAvailable |
Parser version incompatible with tree-sitter |
TreeSitter::QueryCreationError |
TreeHaver::NotAvailable |
Query creation fails |
# Add to your Gemfile for MRI backend
gem "ruby_tree_sitter", "~> 2.0"Rust Backend (tree_stump)
Currently requires pboling's fork until upstream PRs are merged.
# Add to your Gemfile for Rust backend
gem "tree_stump", github: "pboling/tree_stump", branch: "tree_haver"FFI Backend
Requires the ffi gem and a system installation of libtree-sitter:
# Add to your Gemfile for FFI backend
gem "ffi", ">= 1.15", "< 2.0"# Install libtree-sitter on your system:
# macOS
brew install tree-sitter
# Ubuntu/Debian
apt-get install libtree-sitter0 libtree-sitter-dev
# Fedora
dnf install tree-sitter tree-sitter-develCitrus Backend
Pure Ruby parser with no native dependencies:
# Add to your Gemfile for Citrus backend
gem "citrus", "~> 3.0"Java Backend (JRuby only)
No additional dependencies required beyond grammar JARs built for java-tree-sitter.
Why TreeHaver?
tree-sitter is a powerful parser generator that creates incremental parsers for many programming languages. However, integrating it into Ruby applications can be challenging:
- MRI-based C extensions don't work on JRuby
- FFI-based solutions may not be optimal for MRI
- Managing different backends for different Ruby implementations is cumbersome
TreeHaver solves these problems by providing a unified API that automatically selects the appropriate backend for your Ruby implementation, allowing you to write code once and run it anywhere.
Comparison with Other Ruby AST / Parser Bindings
| Feature | tree_haver (this gem) | ruby_tree_sitter | tree_stump | citrus |
|---|---|---|---|---|
| MRI Ruby | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| JRuby | ✅ Yes (FFI, Java, or Citrus backend) | ❌ No | ❌ No | ✅ Yes |
| TruffleRuby | ✅ Yes (FFI or Citrus) | ❌ No | ❓ Unknown | ✅ Yes |
| Backend | Multi (MRI C, Rust, FFI, Java, Citrus) | C extension only | Rust extension | Pure Ruby |
| Incremental Parsing | ✅ Via MRI C/Rust/Java backend | ✅ Yes | ✅ Yes | ❌ No |
| Query API | ⚡ Via MRI/Rust/Java backend | ✅ Yes | ✅ Yes | ❌ No |
| Grammar Discovery | ✅ Built-in GrammarFinder
|
❌ Manual | ❌ Manual | ❌ Manual |
| Security Validations | ✅ PathValidator
|
❌ No | ❌ No | ❌ No |
| Language Registration | ✅ Thread-safe registry | ❌ No | ❌ No | ❌ No |
| Native Performance | ⚡ Backend-dependent | ✅ Native C | ✅ Native Rust | ❌ Pure Ruby |
| Precompiled Binaries | ⚡ Via Rust backend | ✅ Yes | ✅ Yes | ✅ Pure Ruby |
| Zero Native Deps | ⚡ Via Citrus backend | ❌ No | ❌ No | ✅ Yes |
| Minimum Ruby | 3.2+ | 3.0+ | 3.1+ | 0+ |
Note: Java backend works with grammar JARs built specifically for java-tree-sitter, or grammar .so files that statically link tree-sitter. This is why FFI is recommended for JRuby & TruffleRuby.
Note: TreeHaver can use ruby_tree_sitter or tree_stump as backends, giving you TreeHaver's unified API, grammar discovery, and security features, plus full access to incremental parsing when using those backends.
Note: tree_stump currently requires pboling's fork (tree_haver branch) until upstream PRs #5, #7, #11, and #13 are merged.
When to Use Each
Choose TreeHaver when:
- You need JRuby or TruffleRuby support
- You're building a library that should work across Ruby implementations
- You want automatic grammar discovery and security validations
- You want flexibility to switch backends without code changes
- You need incremental parsing with a unified API
Choose ruby_tree_sitter directly when:
- You only target MRI Ruby
- You need the full Query API without abstraction
- You want the most battle-tested C bindings
- You don't need TreeHaver's grammar discovery
Choose tree_stump directly when:
- You only target MRI Ruby
- You prefer Rust-based native extensions
- You want precompiled binaries without system dependencies
- You don't need TreeHaver's grammar discovery
- Note: Use pboling's fork (tree_haver branch) until PRs #5, #7, #11, #13 are merged
Choose citrus directly when:
- You need zero native dependencies (pure Ruby)
- You're using a Citrus grammar (not tree-sitter grammars)
- Performance is less critical than portability
- You don't need TreeHaver's unified API
💡 Info you can shake a stick at
| Tokens to Remember |
|
|---|---|
| Works with JRuby |
|
| Works with Truffle Ruby |
|
| Works with MRI Ruby 3 |
|
| Support & Community |
|
| Source |
|
| Documentation |
|
| Compliance |
|
| Style |
|
| Maintainer 🎖️ |
|
... 💖 |
|
Compatibility
Compatible with MRI Ruby 3.2.0+, and concordant releases of JRuby, and TruffleRuby.
| 🚚 Amazing test matrix was brought to you by | 🔎 appraisal2 🔎 and the color 💚 green 💚 |
|---|---|
| 👟 Check it out! | ✨ github.com/appraisal-rb/appraisal2 ✨ |
Federated DVCS
| Federated DVCS Repository | Status | Issues | PRs | Wiki | CI | Discussions |
|---|---|---|---|---|---|---|
| 🧪 kettle-rb/tree_haver on GitLab | The Truth | 💚 | 💚 | 💚 | 🐭 Tiny Matrix | ➖ |
| 🧊 kettle-rb/tree_haver on CodeBerg | An Ethical Mirror (Donate) | 💚 | 💚 | ➖ | ⭕️ No Matrix | ➖ |
| 🐙 kettle-rb/tree_haver on GitHub | Another Mirror | 💚 | 💚 | 💚 | 💯 Full Matrix | 💚 |
| 🎮️ Discord Server | Let's | talk | about | this | library! |
Available as part of the Tidelift Subscription.
The maintainers of this and thousands of other packages are working with Tidelift to deliver commercial support and maintenance for the open source packages you use to build your applications. Save time, reduce risk, and improve code health, while paying the maintainers of the exact packages you use.
- 💡Subscribe for support guarantees covering all your FLOSS dependencies
- 💡Tidelift is part of Sonar
- 💡Tidelift pays maintainers to maintain the software you depend on!
📊@Pointy Haired Boss: An enterprise support subscription is "never gonna let you down", and supports open source maintainers
Alternatively:
✨ Installation
Install the gem and add to the application's Gemfile by executing:
bundle add tree_haverIf bundler is not being used to manage dependencies, install the gem by executing:
gem install tree_haver🔒 Secure Installation
This gem is cryptographically signed, and has verifiable SHA-256 and SHA-512 checksums by stone_checksums. Be sure the gem you install hasn’t been tampered with by following the instructions below.
Add my public key (if you haven’t already, expires 2045-04-29) as a trusted certificate:
gem cert --add <(curl -Ls https://raw.github.com/galtzo-floss/certs/main/pboling.pem)You only need to do that once. Then proceed to install with:
gem install tree_haver -P HighSecurityThe HighSecurity trust profile will verify signed gems, and not allow the installation of unsigned dependencies.
If you want to up your security game full-time:
bundle config set --global trust-policy MediumSecurityMediumSecurity instead of HighSecurity is necessary if not all the gems you use are signed.
NOTE: Be prepared to track down certs for signed gems and add them the same way you added mine.
⚙️ Configuration
Available Backends
TreeHaver supports multiple parsing backends, each with different trade-offs. The auto backend automatically selects the best available option.
| Backend | Description | Performance | Portability | Examples |
|---|---|---|---|---|
| Auto | Auto-selects best backend | Varies | ✅ Universal | JSON · JSONC · Bash |
| MRI | C extension via ruby_tree_sitter | ⚡ Fastest | MRI only |
JSON · JSONC · |
| Rust | Precompiled via tree_stump | ⚡ Very Fast | ✅ Good |
JSON · JSONC · |
| FFI | Dynamic linking via FFI | 🔵 Fast | ✅ Universal | JSON · JSONC · Bash |
| Java | JNI bindings | ⚡ Very Fast | JRuby only | JSON · JSONC · Bash |
| Citrus | Pure Ruby parsing | 🟡 Slower | ✅ Universal | TOML · Finitio · Dhall |
Selection Priority (Auto mode): MRI → Rust → FFI → Java → Citrus
Known Issues:
- *MRI + Bash: ABI incompatibility (use FFI instead)
- *Rust + Bash: Version mismatch (use FFI instead)
Backend Requirements:
# MRI Backend
gem 'ruby_tree_sitter'
# Rust Backend
gem 'tree_stump'
# FFI Backend
gem 'ffi'
# Citrus Backend
gem 'citrus'
# Plus grammar gems: toml-rb, dhall, finitio, etc.Force Specific Backend:
TreeHaver.backend = :ffi # Force FFI backend
TreeHaver.backend = :mri # Force MRI backend
TreeHaver.backend = :rust # Force Rust backend
TreeHaver.backend = :java # Force Java backend (JRuby)
TreeHaver.backend = :citrus # Force Citrus backend
TreeHaver.backend = :auto # Auto-select (default)Block-based Backend Switching:
Use with_backend to temporarily switch backends for a specific block of code.
This is thread-safe and supports nesting—the previous backend is automatically
restored when the block exits (even if an exception is raised).
# Temporarily use a specific backend
TreeHaver.with_backend(:mri) do
parser = TreeHaver::Parser.new
tree = parser.parse(source)
# All operations in this block use the MRI backend
end
# Backend is restored to its previous value here
# Nested blocks work correctly
TreeHaver.with_backend(:rust) do
# Uses :rust
TreeHaver.with_backend(:citrus) do
# Uses :citrus
parser = TreeHaver::Parser.new
end
# Back to :rust
end
# Back to original backendThis is particularly useful for:
- Testing: Test the same code with different backends
- Performance comparison: Benchmark different backends
- Fallback scenarios: Try one backend, fall back to another
- Thread isolation: Each thread can use a different backend safely
# Example: Testing with multiple backends
[:mri, :rust, :citrus].each do |backend_name|
TreeHaver.with_backend(backend_name) do
parser = TreeHaver::Parser.new
result = parser.parse(source)
puts "#{backend_name}: #{result.root_node.type}"
end
endCheck Backend Capabilities:
TreeHaver.backend # => :ffi
TreeHaver.backend_module # => TreeHaver::Backends::FFI
TreeHaver.capabilities # => { backend: :ffi, parse: true, query: false, ... }See examples/ directory for 18 complete working examples demonstrating all backends and languages.
Security Considerations
⚠️ Loading shared libraries (.so/.dylib/.dll) executes arbitrary native code.
TreeHaver provides defense-in-depth validations, but you should understand the risks:
Attack Vectors Mitigated
TreeHaver's PathValidator module protects against:
-
Path traversal: Paths containing
/../or/./are rejected - Null byte injection: Paths containing null bytes are rejected
- Non-absolute paths: Relative paths are rejected to prevent CWD-based attacks
-
Invalid extensions: Only
.so,.dylib, and.dllfiles are accepted - Malicious filenames: Filenames must match a safe pattern (alphanumeric, hyphens, underscores)
- Invalid language names: Language names must be lowercase alphanumeric with underscores
- Invalid symbol names: Symbol names must be valid C identifiers
Secure Usage
# Standard usage - paths from ENV are validated
finder = TreeHaver::GrammarFinder.new(:toml)
path = finder.find_library_path # Validates ENV path before returning
# Maximum security - only trusted system directories
path = finder.find_library_path_safe # Ignores ENV, only /usr/lib etc.
# Manual validation
if TreeHaver::PathValidator.safe_library_path?(user_provided_path)
language = TreeHaver::Language.from_library(user_provided_path)
end
# Get validation errors for debugging
errors = TreeHaver::PathValidator.validation_errors(path)
# => ["Path is not absolute", "Path contains traversal sequence"]Trusted Directories
The find_library_path_safe method only returns paths in trusted directories.
Default trusted directories:
-
/usr/lib,/usr/lib64 -
/usr/lib/x86_64-linux-gnu,/usr/lib/aarch64-linux-gnu /usr/local/lib-
/opt/homebrew/lib,/opt/local/lib
Adding custom trusted directories:
For non-standard installations (Homebrew on Linux, luarocks, mise, asdf, etc.), register additional trusted directories:
# Programmatically at application startup
TreeHaver::PathValidator.add_trusted_directory("/home/linuxbrew/.linuxbrew/Cellar")
TreeHaver::PathValidator.add_trusted_directory("~/.local/share/mise/installs/lua")
# Or via environment variable (comma-separated, in your shell profile)
export TREE_HAVER_TRUSTED_DIRS = "/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"Example: Fedora Silverblue with Homebrew and luarocks
# In ~/.bashrc or ~/.zshrc
export TREE_HAVER_TRUSTED_DIRS="/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"
# tree-sitter runtime library
export TREE_SITTER_RUNTIME_LIB=/home/linuxbrew/.linuxbrew/Cellar/tree-sitter/0.26.3/lib/libtree-sitter.so
# Language grammar (luarocks-installed)
export TREE_SITTER_TOML_PATH=~/.local/share/mise/installs/lua/5.4.8/luarocks/lib/luarocks/rocks-5.4/tree-sitter-toml/0.0.31-1/parser/toml.soRecommendations
-
Production: Consider using
find_library_path_safeto ignore ENV overrides -
Development: Standard
find_library_pathis convenient for testing -
User Input: Always validate paths before passing to
Language.from_library - CI/CD: Be cautious of ENV vars that could be set by untrusted sources
-
Custom installs: Register trusted directories via
TREE_HAVER_TRUSTED_DIRSoradd_trusted_directory
Backend Selection
TreeHaver automatically selects the best backend for your Ruby implementation, but you can override this behavior:
# Automatic backend selection (default)
TreeHaver.backend = :auto
# Force a specific backend
TreeHaver.backend = :mri # Use ruby_tree_sitter (MRI only, C extension)
TreeHaver.backend = :rust # Use tree_stump (MRI, Rust extension with precompiled binaries)
# Note: Requires pboling's fork until PRs #5, #7, #11, #13 are merged
# See: https://github.com/pboling/tree_stump/tree/tree_haver
TreeHaver.backend = :ffi # Use FFI bindings (works on MRI and JRuby)
TreeHaver.backend = :java # Use Java bindings (JRuby only, coming soon)
TreeHaver.backend = :citrus # Use Citrus pure Ruby parser
# NOTE: Portable, all Ruby implementations
# CAVEAT: few major language grammars, but many esoteric grammarsAuto-selection priority on MRI: MRI → Rust → FFI → Citrus
You can also set the backend via environment variable:
export TREE_HAVER_BACKEND=rustEnvironment Variables
TreeHaver recognizes several environment variables for configuration:
Note: All path-based environment variables are validated before use. Invalid paths are ignored.
Security Configuration
-
TREE_HAVER_TRUSTED_DIRS: Comma-separated list of additional trusted directories for grammar libraries# For Homebrew on Linux and luarocks export TREE_HAVER_TRUSTED_DIRS="/home/linuxbrew/.linuxbrew/Cellar,~/.local/share/mise/installs/lua"
Tilde (
~) is expanded to the user's home directory. Directories listed here are considered safe forfind_library_path_safe.
Core Runtime Library
-
TREE_SITTER_RUNTIME_LIB: Absolute path to the corelibtree-sittershared libraryexport TREE_SITTER_RUNTIME_LIB=/usr/local/lib/libtree-sitter.so
If not set, TreeHaver tries these names in order:
tree-sitterlibtree-sitter.so.0libtree-sitter.solibtree-sitter.dyliblibtree-sitter.dll
Language Symbol Resolution
When loading a language grammar, if you don't specify the symbol: parameter, TreeHaver resolves it in this precedence:
-
TREE_SITTER_LANG_SYMBOL: Explicit symbol override - Guessed from filename (e.g.,
libtree-sitter-toml.so→tree_sitter_toml) - Default fallback (
tree_sitter_toml)
export TREE_SITTER_LANG_SYMBOL=tree_sitter_tomlLanguage Library Paths
For specific languages, you can set environment variables to point to grammar libraries:
export TREE_SITTER_TOML_PATH=/usr/local/lib/libtree-sitter-toml.so
export TREE_SITTER_JSON_PATH=/usr/local/lib/libtree-sitter-json.soJRuby-Specific: Java Backend JARs
For the Java backend on JRuby:
export TREE_SITTER_JAVA_JARS_DIR=/path/to/java-tree-sitter/jarsLanguage Registration
Register languages once at application startup for convenient access:
# Register a TOML grammar
TreeHaver.register_language(
:toml,
path: "/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml", # optional, will be inferred if omitted
)
# Now you can use the convenient helper
language = TreeHaver::Language.toml
# Or still override path/symbol per-call
language = TreeHaver::Language.toml(
path: "/custom/path/libtree-sitter-toml.so",
)Grammar Discovery with GrammarFinder
For libraries that need to automatically locate tree-sitter grammars (like the *-merge family of gems), TreeHaver provides the GrammarFinder utility class. It handles platform-aware grammar discovery without requiring language-specific code in TreeHaver itself.
# Create a finder for any language
finder = TreeHaver::GrammarFinder.new(:toml)
# Check if the grammar is available
if finder.available?
puts "TOML grammar found at: #{finder.find_library_path}"
else
puts finder.not_found_message
# => "tree-sitter toml grammar not found. Searched: /usr/lib/libtree-sitter-toml.so, ..."
end
# Register the language if available
finder.register! if finder.available?
# Now use the registered language
language = TreeHaver::Language.tomlGrammarFinder Automatic Derivation
Given just the language name, GrammarFinder automatically derives:
| Property | Derived Value (for :toml) |
|---|---|
| ENV var | TREE_SITTER_TOML_PATH |
| Library filename |
libtree-sitter-toml.so (Linux) or .dylib (macOS) |
| Symbol name | tree_sitter_toml |
Search Order
GrammarFinder searches for grammars in this order:
-
Environment variable:
TREE_SITTER_<LANG>_PATH(highest priority) - Extra paths: Custom paths provided at initialization
-
System paths: Common installation directories (
/usr/lib,/usr/local/lib,/opt/homebrew/lib, etc.)
Usage in *-merge Gems
The GrammarFinder pattern enables clean integration in language-specific merge gems:
# In toml-merge
finder = TreeHaver::GrammarFinder.new(:toml)
finder.register! if finder.available?
# In json-merge
finder = TreeHaver::GrammarFinder.new(:json)
finder.register! if finder.available?
# In bash-merge
finder = TreeHaver::GrammarFinder.new(:bash)
finder.register! if finder.available?Each gem uses the same API—only the language name changes.
Adding Custom Search Paths
For non-standard installations, provide extra search paths:
finder = TreeHaver::GrammarFinder.new(:toml, extra_paths: [
"/opt/custom/lib",
"/home/user/.local/lib",
])Debug Information
Get detailed information about the grammar search:
finder = TreeHaver::GrammarFinder.new(:toml)
puts finder.search_info
# => {
# language: :toml,
# env_var: "TREE_SITTER_TOML_PATH",
# env_value: nil,
# symbol: "tree_sitter_toml",
# library_filename: "libtree-sitter-toml.so",
# search_paths: ["/usr/lib/libtree-sitter-toml.so", ...],
# found_path: "/usr/lib/libtree-sitter-toml.so",
# available: true
# }Checking Capabilities
Different backends may support different features:
TreeHaver.capabilities
# => { backend: :mri, query: true, bytes_field: true }
# or
# => { backend: :ffi, parse: true, query: false, bytes_field: true }
# or
# => { backend: :citrus, parse: true, query: false, bytes_field: false }Compatibility Mode
For codebases migrating from ruby_tree_sitter, TreeHaver provides a compatibility shim:
require "tree_haver/compat"
# Now TreeSitter constants map to TreeHaver
parser = TreeSitter::Parser.new # Actually creates TreeHaver::ParserThis is safe and idempotent—if the real TreeSitter module is already loaded, the shim does nothing.
⚠️ Critical: Exception Hierarchy Incompatibility
ruby_tree_sitter v2+ exceptions inherit from Exception (not StandardError).
TreeHaver exceptions follow Ruby best practices and inherit from StandardError.
This means exception handling behaves differently between the two:
| Scenario | ruby_tree_sitter v2+ | TreeHaver Compat Mode |
|---|---|---|
rescue => e |
Does NOT catch TreeSitter errors | DOES catch TreeHaver errors |
| Behavior | Errors propagate (inherit Exception) | Errors caught (inherit StandardError) |
Example showing the difference:
# With real ruby_tree_sitter v2+
begin
TreeSitter::Language.load("missing", "/nonexistent.so")
rescue => e
puts "Caught!" # Never reached - TreeSitter errors inherit Exception
end
# With TreeHaver compat mode
require "tree_haver/compat"
begin
TreeSitter::Language.load("missing", "/nonexistent.so") # Actually TreeHaver
rescue => e
puts "Caught!" # WILL be reached - TreeHaver errors inherit StandardError
endTo write compatible exception handling:
# Option 1: Catch specific exception (works with both)
begin
TreeSitter::Language.load(...)
rescue TreeSitter::TreeSitterError => e # Explicit rescue
# Works with both ruby_tree_sitter and TreeHaver compat mode
end
# Option 2: Use TreeHaver API directly (recommended)
begin
TreeHaver::Language.from_library(...)
rescue TreeHaver::NotAvailable => e # TreeHaver's unified exception
# Clear and consistent when using TreeHaver
endWhy TreeHaver uses StandardError:
-
Ruby Best Practice: The Ruby style guide recommends inheriting from
StandardError -
Safety: Inheriting from
Exceptioncan catch system signals (SIGTERM,SIGINT) andexit, which is dangerous - Consistency: Most Ruby libraries follow this convention
- Testability: StandardError exceptions are easier to test and mock
See lib/tree_haver/compat.rb for detailed documentation.
🔧 Basic Usage
Quick Start
Here's a complete example of parsing TOML with TreeHaver:
require "tree_haver"
# Load a language grammar
language = TreeHaver::Language.from_library(
"/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml",
)
# Create a parser
parser = TreeHaver::Parser.new
parser.language = language
# Parse some source code
source = <<~TOML
[package]
name = "my-app"
version = "1.0.0"
TOML
tree = parser.parse(source)
# Access the root node
root = tree.root_node
puts "Root node type: #{root.type}" # => "document"
# Traverse the tree
root.each do |child|
puts "Child type: #{child.type}"
child.each do |grandchild|
puts " Grandchild type: #{grandchild.type}"
end
endUsing Language Registration
For cleaner code, register languages at startup:
# At application initialization
TreeHaver.register_language(
:toml,
path: "/usr/local/lib/libtree-sitter-toml.so",
)
TreeHaver.register_language(
:json,
path: "/usr/local/lib/libtree-sitter-json.so",
)
# Later in your code
toml_language = TreeHaver::Language.toml
json_language = TreeHaver::Language.json
parser = TreeHaver::Parser.new
parser.language = toml_language
tree = parser.parse(toml_source)Flexible Language Names
The name parameter in register_language is an arbitrary identifier you choose—it doesn't
need to match the actual language name. The actual grammar identity comes from the path
and symbol parameters (for tree-sitter) or grammar_module (for Citrus).
This flexibility is useful for:
- Aliasing: Register the same grammar under multiple names
-
Versioning: Register different grammar versions (e.g.,
:ruby_2,:ruby_3) - Testing: Use unique names to avoid collisions between tests
- Context-specific naming: Use names that make sense for your application
# Register the same TOML grammar under different names for different purposes
TreeHaver.register_language(
:config_parser, # Custom name for your app
path: "/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml",
)
TreeHaver.register_language(
:toml_v1, # Version-specific name
path: "/usr/local/lib/libtree-sitter-toml.so",
symbol: "tree_sitter_toml",
)
# Use your custom names
config_lang = TreeHaver::Language.config_parser
versioned_lang = TreeHaver::Language.toml_v1Parsing Different Languages
TreeHaver works with any tree-sitter grammar:
# Parse Ruby code
ruby_lang = TreeHaver::Language.from_library(
"/path/to/libtree-sitter-ruby.so",
)
parser = TreeHaver::Parser.new
parser.language = ruby_lang
tree = parser.parse("class Foo; end")
# Parse JavaScript
js_lang = TreeHaver::Language.from_library(
"/path/to/libtree-sitter-javascript.so",
)
parser.language = js_lang # Reuse the same parser
tree = parser.parse("const x = 42;")Walking the AST
TreeHaver provides simple node traversal:
tree = parser.parse(source)
root = tree.root_node
# Recursive tree walk
def walk_tree(node, depth = 0)
puts "#{" " * depth}#{node.type}"
node.each { |child| walk_tree(child, depth + 1) }
end
walk_tree(root)Incremental Parsing
TreeHaver supports incremental parsing when using the MRI or Rust backends. This is a major performance optimization for editors and IDEs that need to re-parse on every keystroke.
# Check if current backend supports incremental parsing
if TreeHaver.capabilities[:incremental]
puts "Incremental parsing is available!"
end
# Initial parse
parser = TreeHaver::Parser.new
parser.language = language
tree = parser.parse_string(nil, "x = 1")
# User edits the source: "x = 1" -> "x = 42"
# Mark the tree as edited (tell tree-sitter what changed)
tree.edit(
start_byte: 4, # edit starts at byte 4
old_end_byte: 5, # old text "1" ended at byte 5
new_end_byte: 6, # new text "42" ends at byte 6
start_point: {row: 0, column: 4},
old_end_point: {row: 0, column: 5},
new_end_point: {row: 0, column: 6},
)
# Re-parse incrementally - tree-sitter reuses unchanged nodes
new_tree = parser.parse_string(tree, "x = 42")Note: Incremental parsing requires the MRI (ruby_tree_sitter), Rust (tree_stump), or Java (java-tree-sitter) backend. The FFI and Citrus backends do not currently support incremental parsing. You can check support with:
Note: tree_stump requires pboling's fork (tree_haver branch) until PRs #5, #7, #11, #13 are merged.
tree.supports_editing? # => true if edit() is availableError Handling
begin
language = TreeHaver::Language.from_library("/path/to/grammar.so")
rescue TreeHaver::NotAvailable => e
puts "Failed to load grammar: #{e.message}"
end
# Check if a backend is available
if TreeHaver.backend_module.nil?
puts "No TreeHaver backend is available!"
puts "Install ruby_tree_sitter (MRI), ffi gem with libtree-sitter, or citrus gem"
endPlatform-Specific Examples
MRI Ruby
On MRI, TreeHaver uses ruby_tree_sitter by default:
# Gemfile
gem "tree_haver"
gem "ruby_tree_sitter" # MRI backend
# Code - no changes needed, TreeHaver auto-selects MRI backend
parser = TreeHaver::Parser.newJRuby
On JRuby, TreeHaver can use the FFI backend, Java backend, or Citrus backend:
Option 1: FFI Backend (recommended for tree-sitter grammars)
# Gemfile
gem "tree_haver"
gem "ffi" # Required for FFI backend
# Ensure libtree-sitter is installed on your system
# On macOS with Homebrew:
# brew install tree-sitter
# On Ubuntu/Debian:
# sudo apt-get install libtree-sitter0 libtree-sitter-dev
# Code - TreeHaver auto-selects FFI backend on JRuby
parser = TreeHaver::Parser.newOption 2: Java Backend (native JVM performance)
# 1. Download java-tree-sitter JAR from Maven Central
mkdir -p vendor/jars
curl -fSL -o vendor/jars/jtreesitter-0.23.2.jar \
"https://repo1.maven.org/maven2/io/github/tree-sitter/jtreesitter/0.23.2/jtreesitter-0.23.2.jar"
# 2. Set environment variables
export CLASSPATH="$(pwd)/vendor/jars:$CLASSPATH"
export LD_LIBRARY_PATH="/path/to/libtree-sitter/lib:$LD_LIBRARY_PATH"
# 3. Run with JRuby (requires Java 22+ for Foreign Function API)
JAVA_OPTS="--enable-native-access=ALL-UNNAMED" jruby your_script.rb# Force Java backend
TreeHaver.backend = :java
# Check if Java backend is available
if TreeHaver::Backends::Java.available?
puts "Java backend is ready!"
puts TreeHaver.capabilities
# => { backend: :java, parse: true, query: true, bytes_field: true, incremental: true }
end⚠️ Java Backend Limitation: Symbol Resolution
The Java backend uses Java's Foreign Function & Memory (FFM) API which loads libraries in isolation. Unlike the system's dynamic linker (dlopen), FFM's SymbolLookup.or() chains symbol lookups but doesn't resolve dynamic library dependencies.
This means grammar .so files with unresolved references to libtree-sitter.so symbols won't load correctly. Most grammars from luarocks, npm, or other sources have these dependencies.
Recommended approach for JRuby: Use the FFI backend:
# On JRuby, use FFI backend (recommended)
TreeHaver.backend = :ffiThe FFI backend uses Ruby's FFI gem which relies on the system's dynamic linker, correctly resolving symbol dependencies between libtree-sitter.so and grammar libraries.
The Java backend will work with:
- Grammar JARs built specifically for java-tree-sitter (self-contained)
- Grammar
.sofiles that statically link tree-sitter
Option 3: Citrus Backend (pure Ruby, portable)
# Gemfile
gem "tree_haver"
gem "citrus" # Pure Ruby parser, zero native dependencies
# Code - Force Citrus backend for maximum portability
TreeHaver.backend = :citrus
# Check if Citrus backend is available
if TreeHaver::Backends::Citrus.available?
puts "Citrus backend is ready!"
puts TreeHaver.capabilities
# => { backend: :citrus, parse: true, query: false, bytes_field: false }
end⚠️ Citrus Backend Limitations:
- Uses Citrus grammars (not tree-sitter grammars)
- No incremental parsing support
- No query API
- Pure Ruby performance (slower than native backends)
- Best for: prototyping, environments without native extension support, teaching
TruffleRuby
TruffleRuby can use the MRI, FFI, or Citrus backend:
# Use FFI backend (recommended for tree-sitter grammars)
TreeHaver.backend = :ffi
# Or try MRI backend if ruby_tree_sitter compiles on your TruffleRuby version
TreeHaver.backend = :mri
# Or use Citrus backend for zero native dependencies
TreeHaver.backend = :citrusAdvanced: Thread-Safe Backend Switching
TreeHaver provides with_backend for thread-safe, temporary backend switching. This is
essential for testing, benchmarking, and applications that need different backends in
different contexts.
Testing with Multiple Backends
Test the same code path with different backends using with_backend:
# In your test setup
RSpec.describe("MyParser") do
# Test with each available backend
[:mri, :rust, :citrus].each do |backend_name|
context "with #{backend_name} backend" do
it "parses correctly" do
TreeHaver.with_backend(backend_name) do
parser = TreeHaver::Parser.new
result = parser.parse("x = 42")
expect(result.root_node.type).to eq("document")
end
# Backend automatically restored after block
end
end
end
endThread Isolation
Each thread can use a different backend safely—with_backend uses thread-local storage:
threads = []
threads << Thread.new do
TreeHaver.with_backend(:mri) do
# This thread uses MRI backend
parser = TreeHaver::Parser.new
100.times { parser.parse("x = 1") }
end
end
threads << Thread.new do
TreeHaver.with_backend(:citrus) do
# This thread uses Citrus backend simultaneously
parser = TreeHaver::Parser.new
100.times { parser.parse("x = 1") }
end
end
threads.each(&:join)Nested Blocks
with_backend supports nesting—inner blocks override outer blocks:
TreeHaver.with_backend(:rust) do
puts TreeHaver.effective_backend # => :rust
TreeHaver.with_backend(:citrus) do
puts TreeHaver.effective_backend # => :citrus
end
puts TreeHaver.effective_backend # => :rust (restored)
endFallback Pattern
Try one backend, fall back to another on failure:
def parse_with_fallback(source)
TreeHaver.with_backend(:mri) do
TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
end
rescue TreeHaver::NotAvailable
# Fall back to Citrus if MRI backend unavailable
TreeHaver.with_backend(:citrus) do
TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
end
endComplete Real-World Example
Here's a practical example that extracts package names from a TOML file:
require "tree_haver"
# Setup
TreeHaver.register_language(
:toml,
path: "/usr/local/lib/libtree-sitter-toml.so",
)
def extract_package_name(toml_content)
# Create parser
parser = TreeHaver::Parser.new
parser.language = TreeHaver::Language.toml
# Parse
tree = parser.parse(toml_content)
root = tree.root_node
# Find [package] table
root.each do |child|
next unless child.type == "table"
child.each do |table_elem|
if table_elem.type == "pair"
# Look for name = "..." pair
key = table_elem.each.first&.type
# In a real implementation, you'd extract the text value
# This is simplified for demonstration
end
end
end
end
# Usage
toml = <<~TOML
[package]
name = "awesome-app"
version = "2.0.0"
TOML
package_name = extract_package_name(toml)🦷 FLOSS Funding
While kettle-rb tools are free software and will always be, the project would benefit immensely from some funding. Raising a monthly budget of... "dollars" would make the project more sustainable.
We welcome both individual and corporate sponsors! We also offer a wide array of funding channels to account for your preferences (although currently Open Collective is our preferred funding platform).
If you're working in a company that's making significant use of kettle-rb tools we'd appreciate it if you suggest to your company to become a kettle-rb sponsor.
You can support the development of kettle-rb tools via GitHub Sponsors, Liberapay, PayPal, Open Collective and Tidelift.
| 📍 NOTE |
|---|
| If doing a sponsorship in the form of donation is problematic for your company from an accounting standpoint, we'd recommend the use of Tidelift, where you can get a support-like subscription instead. |
Open Collective for Individuals
Support us with a monthly donation and help us continue our activities. [Become a backer]
NOTE: kettle-readme-backers updates this list every day, automatically.
No backers yet. Be the first!
Open Collective for Organizations
Become a sponsor and get your logo on our README on GitHub with a link to your site. [Become a sponsor]
NOTE: kettle-readme-backers updates this list every day, automatically.
No sponsors yet. Be the first!
Another way to support open-source
I’m driven by a passion to foster a thriving open-source community – a space where people can tackle complex problems, no matter how small. Revitalizing libraries that have fallen into disrepair, and building new libraries focused on solving real-world challenges, are my passions. I was recently affected by layoffs, and the tech jobs market is unwelcoming. I’m reaching out here because your support would significantly aid my efforts to provide for my family, and my farm (11 🐔 chickens, 2 🐶 dogs, 3 🐰 rabbits, 8 🐈 cats).
If you work at a company that uses my work, please encourage them to support me as a corporate sponsor. My work on gems you use might show up in bundle fund.
I’m developing a new library, floss_funding, designed to empower open-source developers like myself to get paid for the work we do, in a sustainable way. Please give it a look.
Floss-Funding.dev: 👉️ No network calls. 👉️ No tracking. 👉️ No oversight. 👉️ Minimal crypto hashing. 💡 Easily disabled nags
🔐 Security
See SECURITY.md.
🤝 Contributing
If you need some ideas of where to help, you could work on adding more code coverage, or if it is already 💯 (see below) check reek, issues, or PRs, or use the gem and think about how it could be better.
We so if you make changes, remember to update it.
See CONTRIBUTING.md for more detailed instructions.
🚀 Release Instructions
See CONTRIBUTING.md.
Code Coverage
🪇 Code of Conduct
Everyone interacting with this project's codebases, issue trackers,
chat rooms and mailing lists agrees to follow the .
🌈 Contributors
Made with contributors-img.
Also see GitLab Contributors: https://gitlab.com/kettle-rb/tree_haver/-/graphs/main
📌 Versioning
This Library adheres to .
Violations of this scheme should be reported as bugs.
Specifically, if a minor or patch version is released that breaks backward compatibility,
a new version should be immediately released that restores compatibility.
Breaking changes to the public API will only be introduced with new major versions.
dropping support for a platform is both obviously and objectively a breaking change
—Jordan Harband (@ljharb, maintainer of SemVer) in SemVer issue 716
I understand that policy doesn't work universally ("exceptions to every rule!"), but it is the policy here. As such, in many cases it is good to specify a dependency on this library using the Pessimistic Version Constraint with two digits of precision.
For example:
spec.add_dependency("tree_haver", "~> 1.0")SemVer should, IMO, but doesn't explicitly, say that dropping support for specific Platforms is a breaking change to an API, and for that reason the bike shedding is endless.
To get a better understanding of how SemVer is intended to work over a project's lifetime, read this article from the creator of SemVer:
See CHANGELOG.md for a list of releases.
📄 License
The gem is available as open source under the terms of
the MIT License .
See LICENSE.txt for the official Copyright Notice.
© Copyright
-
Copyright (c) 2025 Peter H. Boling, of
Galtzo.com
, and tree_haver contributors.
🤑 A request for help
Maintainers have teeth and need to pay their dentists. After getting laid off in an RIF in March, and encountering difficulty finding a new one, I began spending most of my time building open source tools. I'm hoping to be able to pay for my kids' health insurance this month, so if you value the work I am doing, I need your support. Please consider sponsoring me or the project.
To join the community or get help 👇️ Join the Discord.
To say "thanks!" ☝️ Join the Discord or 👇️ send money.
Please give the project a star ⭐ ♥.
Thanks for RTFM. ☺️