Project

unibits

0.04
Low commit activity in last 3 years
A long-lived project that still receives updates
Visualizes encodings in the terminal. Supports UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE, US-ASCII, ASCII-8BIT, and most of Rubies single-byte encodings. Comes as CLI command and as Ruby Kernel method.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

>= 0.9, < 3.0
~> 2.0, >= 2.0.1
~> 1.4
 Project Readme

unibits | Reveal the Unicode [version] [ci]

Ruby library and CLI command that visualizes various Unicode and ASCII/single byte encodings in the terminal:

  • Makes analyzing encodings easier
  • Helps you with debugging strings
  • Highlights invalid/special/blank bytes/characters/codepoints
  • Supports UTF-8, UTF-16LE/UTF-16BE, UTF-32LE/UTF-32BE, ISO-8859-X, Windows-125X, IBMX, CP85X, macX, TIS-620/Windows-874, KOI8-R/KOI8-U, 7-Bit ASCII/GB1988, and arbitrary BINARY data

Color Coding

Each byte of the given string is highlighted using the following mechanism (characters -> codepoints):

  • Red for invalid bytes
  • Light blue for blanks
  • Blue for control characters
  • Non-control formatting characters in pink
  • Green for marks (Unicode only)
  • Orange for unassigned codepoints
  • Lighter orange for unassigned codepoints which are also ignorable
  • Random color for all other codepoints

The same colors are used in the higher-level companion tool uniscribe.

Setup

Make sure you have Ruby installed and installing gems works properly. Then do:

$ gem install unibits

Usage

Pass the string to debug to unibits:

From CLI

$ unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

From Ruby

require 'unibits/kernel_method'
unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

Advanced Options

unibits takes some optional options:

  • encoding (e): The encoding of the given string (uses the string's default encoding if none given)
  • convert (c): An encoding the string should be converted to before visualizing it
  • stats: Whether to show a short stats header (default: true), you can deactivate on the CLI with --no-stats
  • wide-ambiguous: Treat characters of ambiguous width as 2 spaces instead of 1 (more info)
  • width (w): Set a custom column width, if not set, unibits will retrieve it from the terminal or just use 80

Examples of Valid Encodings

UTF-8

CLI: $ unibits -e utf-8 -c utf-8 "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

Ruby: unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'utf-8', convert: 'utf-8'

Screenshot UTF-8

UTF-16LE

CLI: $ unibits -e utf-8 -c utf-16le "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

Ruby: unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'utf-8', convert: 'utf-16le'

Screenshot UTF-16LE

UTF-32BE

CLI: $ unibits -e utf-8 -c utf-32be "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

Ruby: unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'utf-8', convert: 'utf-32be'

Screenshot UTF-32BE

BINARY

CLI: $ unibits -e binary "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪"

Ruby: unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'binary'

Screenshot BINARY

ASCII

CLI: $ unibits -e utf-8 -c ascii "ascii"

Ruby: unibits "ascii", encoding: 'utf-8', convert: 'ascii'

Screenshot ASCII

Examples of Invalid Encodings

UTF-8

Example in Ruby: unibits "unexpected \x80 | not enough \xF0\x9F\x8C | overlong \xE0\x81\x81 | surrogate \xED\xA0\x80 | too large \xF5\x8F\xBF\xBF"

Screenshot invalid UTF-8

ASCII

Example in Ruby: unibits "馃尗 Idio锘縮yncr盲tic 鈩溦结柉蕪", encoding: 'ascii'

Screenshot invalid ASCII

Notes

More info

Related gems

Lots of thanks to @damienklinnert for the motivation and inspiration required to build this! 馃巻

Copyright (C) 2017-2023 Jan Lelis https://janlelis.com. Released under the MIT license.