Shell Parser
A compact Ruby parser for POSIX Shell Command Language syntax that produces a simple AST suitable for syntax highlighting and shell execution.
Goal
Remain compact and minimalist in design, while offering a reasonably complete parsing.
Features
- Tokenization - Breaks shell commands into tokens with position tracking
- Simple AST - Clean, easy-to-traverse abstract syntax tree
- POSIX Compliance - Based on POSIX Shell Command Language specification
- Quoting Support - Handles single quotes, double quotes, and backslash escaping
-
Expansions - Parses variable expansions (
$VAR,${VAR}) and command substitution ($(...),`...`) - Command Structures - Pipelines, lists (&&, ||, ;, &), and redirections
- Compact - ~320 lines of clean, readable Ruby code
Installation
Add this line to your application's Gemfile:
gem 'shell_parser'And then execute:
bundle installOr install it yourself as:
gem install shell_parserAST Node Types
Word
A word is composed of one or more parts:
Word = Struct.new(:parts, :pos, :len)
# parts: array of Literal, Variable, or CommandSub
# pos: character position in input
# len: total lengthWord Parts
Literal - Plain text (possibly quoted):
Literal = Struct.new(:value, :pos, :len, :quote_style)
# value: the text content
# quote_style: :none, :single, or :doubleVariable - Variable expansion:
Variable = Struct.new(:name, :pos, :len, :braced, :quote_style)
# name: variable name (e.g., "HOME" for $HOME)
# braced: true if ${VAR} form, false if $VAR
# quote_style: :none or :double (variables don't expand in single quotes)CommandSub - Command substitution:
CommandSub = Struct.new(:command, :pos, :len, :style, :quote_style)
# command: the command text to execute
# style: :dollar for $(cmd) or :backtick for `cmd`
# quote_style: :none or :doubleCommand
A simple command with arguments and redirections:
Command = Struct.new(:words, :redirects)
# words: array of Word nodes
# redirects: array of Redirect nodesPipeline
Commands connected by pipes (|):
Pipeline = Struct.new(:commands, :negated)
# commands: array of Command nodes
# negated: boolean (for future ! pipeline support)List
Commands connected by control operators:
List = Struct.new(:left, :op, :right)
# left/right: Command, Pipeline, or List
# op: :and (&&), :or (||), :semi (;), :background (&)Redirect
I/O redirection:
Redirect = Struct.new(:type, :fd, :target)
# type: :in (<), :out (>), :append (>>), :heredoc (<<), etc.
# fd: file descriptor number (optional)
# target: Word nodeUsage
Basic Parsing
require_relative 'shell_parser'
# Parse a command
ast = ShellParser.parse("ls -la /tmp")
# => Command with 3 words
# Parse a pipeline
ast = ShellParser.parse("cat file.txt | grep error | wc -l")
# => Pipeline with 3 commands
# Parse command lists
ast = ShellParser.parse("make && make test || echo failed")
# => List with nested listsSyntax Highlighting
The parser provides detailed structure perfect for syntax highlighting:
ast = ShellParser.parse("echo $HOME > output.txt")
ast.words.each do |word|
word.parts.each do |part|
case part
when ShellParser::Literal
case part.quote_style
when :single then highlight_single_quoted(part.value, part.pos, part.len)
when :double then highlight_double_quoted(part.value, part.pos, part.len)
else highlight_literal(part.value, part.pos, part.len)
end
when ShellParser::Variable
highlight_variable(part.name, part.pos, part.len, part.braced)
when ShellParser::CommandSub
highlight_command_sub(part.command, part.pos, part.len, part.style)
end
end
end
ast.redirects.each do |redir|
highlight_redirection(redir.type, redir.target)
endThe structured representation makes it easy to apply context-aware highlighting:
# "Hello $USER" is represented as:
word.parts #=> [
Literal("Hello ", quote_style: :double),
Variable("USER", quote_style: :double)
]Shell Execution
The AST makes it easy to traverse and execute commands:
ast = ShellParser.parse("echo $HOME > output.txt")
# Expand words by processing their parts
def expand_word(word)
word.parts.map do |part|
case part
when ShellParser::Literal
part.value # Use as-is
when ShellParser::Variable
ENV[part.name] || "" # Look up variable
when ShellParser::CommandSub
`#{part.command}`.chomp # Execute command
end
end.join
end
# Execute based on AST structure
case ast
when ShellParser::Command
args = ast.words.map { |w| expand_word(w) }
execute_command(args, ast.redirects)
when ShellParser::Pipeline
setup_pipe do
ast.commands.each do |cmd|
args = cmd.words.map { |w| expand_word(w) }
execute_in_pipeline(args, cmd.redirects)
end
end
when ShellParser::List
result = execute(ast.left)
case ast.op
when :and then execute(ast.right) if result == 0
when :or then execute(ast.right) if result != 0
when :semi then execute(ast.right)
when :background then fork { execute(ast.right) }
end
endThe quote_style field tells you how to handle word splitting and glob expansion:
part.quote_style == :none # Apply glob expansion and word splitting
part.quote_style == :single # Use literal value, no expansion
part.quote_style == :double # Expand variables/commands, but no glob/splitSupported Syntax
Simple Commands
ls -la /tmp
echo "hello world"Pipelines
cat file.txt | grep pattern | wc -lCommand Lists
make && make test # AND - execute if previous succeeds
make || echo "failed" # OR - execute if previous fails
make ; make test # Sequential - always execute both
sleep 10 & # Background jobRedirections
command < input.txt # Input redirection
command > output.txt # Output redirection
command >> output.txt # Append
command 2>> error.log # Redirect stderrQuoting
echo 'single quotes preserve everything literally'
echo "double quotes allow $VAR expansion"
echo escaped\ spaceExpansions
echo $HOME # Variable expansion
echo ${USER} # Variable expansion (braced)
echo $(date) # Command substitution
echo `whoami` # Command substitution (backticks)Examples
See examples.rb for complete working examples of:
- Syntax highlighting with token positions
- Execution plan generation from AST
- Pretty-printing AST structures
Run examples:
ruby examples.rbDesign Goals
- Simplicity - Clean, understandable code without excessive abstraction
- Compactness - Core parser in ~320 lines
-
Practicality - Focus on two main use cases:
- Syntax highlighting (needs tokens with positions)
- Shell execution (needs command structure)
- POSIX Foundation - Based on POSIX spec but simplified where practical
Limitations
This is a simplified parser focused on the core syntax. Not currently supported:
- Compound commands (
if,while,for,case,{...},(...)) - Function definitions
- Arithmetic expansion
$((...)) - Parameter expansion modifiers
${var:-default} - Here-documents (parsed but not fully implemented)
- Pattern matching and globbing
- Reserved words as special tokens
These can be added incrementally as needed.
Architecture
Lexer (ShellParser::Lexer)
- Scans input character by character
- Handles quoting, escaping, and special characters
- Produces token stream with position information
- Preserves metadata for syntax highlighting
Parser (ShellParser::Parser)
- Recursive descent parser
- Consumes tokens to build AST
- Handles operator precedence
- Simple error reporting
AST (Struct-based nodes)
- Lightweight node types using Ruby Structs
- Easy to pattern match and traverse
- Minimal memory overhead
References
rsh Integration Roadmap
rsh is a Ruby shell that currently uses an 80-line tokenizer for command parsing. The integration path with shell_parser:
-
Add as dependency — Add
gem 'shell_parser'to rsh's Gemfile andrequire 'shell_parser'in the main entry point. -
Replace tokenizer — Replace
tokenize_command/parse_shell_commandincommand_parser.rbwithShellParser.parse, gaining proper AST-driven parsing for pipelines, lists, redirects, and quoting. -
AST-driven execution — Use the structured AST (
Command,Pipeline,List) for execution instead of passing raw command strings toexec, enabling proper variable expansion, pipeline setup, and redirection handling within the Ruby process.