The project is in a healthy, maintained state
Ruby Scientist and Graphics is a practical data science toolkit for Ruby. It includes a lightweight built-in DataFrame for loading, cleaning, and transforming data; quick descriptive statistics and correlations; charting via Gruff (bar and line); and simple ML utilities (linear regression and k-means)—all behind a small, unified, pandas-inspired API. Key features: - Load data from CSV and JSON. - Clean and transform (remove/add columns, handle missing values, limit rows). - Describe datasets and compute correlations quickly. - Create bar and line charts with customization options. - Train/predict with linear regression; cluster with k-means. - Save/load project state (data + trained model) and run simple pipelines. - Optional backend adapters (e.g., Rover) while keeping the same API. Ideal for analysts and developers who want to explore data in Ruby without relying on Python or R. Note: plotting via Gruff uses rmagick, which requires ImageMagick installed on the system.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

~> 5.0
~> 13.0

Runtime

~> 3.3
~> 0.29
 Project Readme

RubyScientistAndGraphics

Lightweight data science toolkit for Ruby: load/clean data, get quick stats, plot charts, and train simple ML models — all in one gem and with zero heavy dependencies.

It ships with a minimal in-house DataFrame (no Daru required), Gruff for plotting, and tiny implementations for statistics and ML (linear regression and k-means).

Features

  • Load and save CSV/JSON, plus save/load a simple “project” (columns + rows).
  • Data cleaning helpers: remove columns, fill missing values, limit rows.
  • Quick stats: per-column mean/min/max and Pearson correlation.
  • Plotting: bar and line charts via Gruff.
  • ML: linear regression (least squares) and k-means clustering.

Installation

Clone and use directly, or add to your Gemfile from a git source until published to RubyGems:

gem 'ruby_scientist_and_graphics', git: 'https://github.com/your-user/ruby_scientist_and_graphics'

Then install:

bundle install

Ruby 3.2+ is recommended.

Quick start

Run the demo to see the workflow end-to-end:

ruby demo.rb

Or use the API:

require_relative 'lib/ruby_scientist_and_graphics'

interface = RubyScientistAndGraphics::Interface.new

# 1) Load and clean
interface.load('test/fixtures/sample.csv', remove_columns: [:comentarios], limit: 5)
interface.clean(missing: 0)

# 2) Stats
interface.analyze

# 3) Plot
interface.graph(type: :bar, x: :mes, y: :ventas, file: 'output.png')

# 4) Train a model
model = interface.train_model(type: :linear_regression, features: [:mes], target: :ventas)
preds = model.predict([[1.0], [2.0], [3.0]])

# 5) Save project
interface.save_project('project.json')

# 6) Load a previously saved project and predict
interface.load_project('project.json')
interface.train_model(type: :linear_regression, features: [:mes], target: :ventas)
preds = interface.predict([[6.0], [7.0]])

API overview

  • DataFrame (internal): CSV load, indexing by column symbol, head, write_csv, map_vectors, filter_rows.
  • IO: load_csv, load_json, save_csv, save_json, save_project, load_project.
  • Dataset: remove_columns, add_column, limit_rows, fill_missing, head, stats, plot.
  • Stats: describe, correlation(col1, col2).
  • Plotter: bar(x:, y:, file:), line(x:, y:, file:).
  • Interface: load, clean, analyze, graph, pipeline, train_model, save_project.
  • Interface: load, clean, analyze, graph, pipeline, train_model, save_project, load_project, predict.

Adapters (optional backends)

This gem includes a minimal in-house DataFrame that powers all features. If you want more performance or richer operations (group-by, joins, rolling, etc.), you can plug a third-party backend behind the same API using a simple adapter pattern.

Potential backends:

  • Polars (Ruby bindings): very fast, columnar engine written in Rust.
  • Rover-Df: pure Ruby DataFrame with a friendly API.

Adapter idea (sketch):

module RubyScientistAndGraphics
	module Backends
		class PolarsAdapter
			def self.from_csv(path); end
			def vectors; end
			def [](col); end
			def to_a; end
			# implement methods used by Dataset/Stats/Plotter
		end
	end
end

# Then inject at app start:
# RubyScientistAndGraphics::DataFrame = RubyScientistAndGraphics::Backends::PolarsAdapter

This keeps your app code unchanged while letting you switch engines.

Development

Setup and tests:

bin/setup
bundle exec rake test

Run an interactive console:

bin/console

Build and install locally:

bundle exec rake install

Release flow: bump version in lib/ruby_scientist_and_graphics/version.rb, then:

bundle exec rake release

Contributing

Pull requests are welcome. Please open an issue to discuss large changes first. See CODE_OF_CONDUCT.md.

License

MIT License. See LICENSE.txt.