Project

syntax

0.26
No commit activity in last 3 years
No release in over 3 years
Syntax is Ruby library for performing simple syntax highlighting.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
 Dependencies

Development

< 11.0.0
>= 0
 Project Readme

Syntax¶ ↑

A syntax highlighting a library for Ruby.

Build Status

This fork is maintained and version 1.1.0 has been published from it. However, there's currently none or not much new development going on here and the original author, @jamis, recommends using CodeRay, over this library.

About¶ ↑

This is a simple syntax highlighting library for Ruby. It is a naive syntax analysis tool, meaning that it does not “understand” the syntaxes of the languages it processes, but merely does some semi-intelligent pattern matching.

Usage¶ ↑

There are primarily two uses for the Syntax library:

  • Convert text from a supported syntax to a supported highlight format (like HTML).

  • Tokenize text in a supported syntax and process the tokens directly.

Highlighting a supported syntax¶ ↑

require 'syntax/convertors/html'

convertor = Syntax::Convertors::HTML.for_syntax "ruby"
puts convertor.convert( File.read( "file.rb" ) )

The above snippet will emit HTML, using spans and CSS to indicate the different highlight “groups”. (Sample CSS files are included in the “data” directory.)

Tokenize text¶ ↑

require 'syntax'

tokenizer = Syntax.load "ruby"
tokenizer.tokenize( File.read( "file.rb" ) ) do |token|
  puts "group(#{token.group}, #{token.instruction}) lexeme(#{token})"
end

Tokenizing is straightforward process. Each time a new token is discovered by the tokenizer, it is yielded to the given block.

  • token.group is the lexical group to which the token belongs. Each supported syntax may have it's own set of lexical groups.

  • token.instruction is an instruction used to determine how this token should be treated. It will be :none for normal tokens, :region_open if the token starts a nested region, and :region_close if it closes the last opened region.

  • token is itself a subclass of String, so you can use it just as you would a string. It represents the lexeme that was actually parsed.