Project

scout-rig

0.0
No release in over 3 years
Use other coding languages in your scout applications
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

~> 2.1.0

Runtime

 Project Readme

scout-rig

scout-rig provides the language interop “rigging” for the Scout ecosystem. It currently focuses on Python: executing Python from Ruby, round‑tripping data (TSV ↔ pandas), and running Scout Workflows from Python code. It builds on the low-level/core packages:

  • scout-essentials — low level utilities (Annotation, CMD, ConcurrentStream, IndiferentHash, Log, Open, Path, Persist, TmpFile)
  • scout-gear — data and workflow primitives (TSV, Workflow, KnowledgeBase, Association, Entity, WorkQueue, Semaphore)
  • scout-rig — interop with other languages (currently Python)
  • scout-camp — remote servers, cloud deployments, web interfaces, cross-site operations
  • scout-ai — model training and agentic tools

All packages are available on GitHub under https://github.com/mikisvaz (for example, https://github.com/mikisvaz/scout-gear).

For broader background and many real workflow examples, see Rbbt (the bioinformatics framework from which Scout was refactored) and the Rbbt-Workflows organization:

This README focuses on the Python bridge in scout-rig (ScoutPython). See the docs in doc/ for reference material.

  • doc/Python.md — ScoutPython user guide

What you get

ScoutPython (Ruby) and a companion Python package (python/scout) provide:

  • Safe, ergonomic execution of Python code from Ruby (PyCall-based), with:
    • Simple import helpers and localized bindings
    • Synchronous, direct, or background-thread execution
    • Logging wrappers that capture Python stdout/stderr
  • Scripting to run ad‑hoc Python text with Ruby variables (including TSV) injected, and results returned
  • Data conversion helpers:
    • numpy arrays → Ruby Arrays
    • pandas DataFrame ↔ TSV (key_field, fields, type respected)
  • Python path management (expose package python/ dirs to sys.path)
  • Python‑side helpers to:
    • Read/write TSVs with headers (pandas)
    • Run Ruby Workflows from Python
    • Call remote Workflow services over HTTP

Installation and requirements

Ruby

  • Ruby 2.6+ (or compatible with PyCall)
  • Gems:
    • pycall (PyCall)
    • json (standard)
    • Optional for script result loading:
      • python/pickle (gem) for loading pickle from Python scripts

Python

  • Python 3
  • Packages:
    • pandas
    • numpy
    • requests (only for remote workflow client)
  • Ensure python3 is in PATH

Add scout-rig to your Ruby project (Gemfile or local checkout), then ensure Python dependencies are installed in your Python environment.


Quick start

Execute Python directly from Ruby:

require 'scout_python'

# Sum with numpy
arr_sum = ScoutPython.run 'numpy', as: :np do
  np.array([1,2,3]).sum
end
# => PyObject (to_i if needed)

# Background thread execution
ScoutPython.run_threaded :sys do
  sys.path.append('/opt/my_py_pkg')
end
ScoutPython.stop_thread

Run an ad‑hoc Python script, returning a result value:

tsv = TSV.setup({}, "Key~ValueA,ValueB#:type=:list")
tsv["k1"] = %w[a1 b1]; tsv["k2"] = %w[a2 b2]

TmpFile.with_file do |target|
  result = ScoutPython.script <<~PY, df: tsv, target: target
    import scout
    # df is a pandas DataFrame (tsv injected)
    result = df.loc["k2", "ValueB"]
    scout.save_tsv(target, df)  # save as TSV with header
  PY

  # result is "b2"; target holds a TSV round-tripped from pandas
end

Convert between TSV and pandas:

df = ScoutPython.tsv2df(tsv)      # TSV -> pandas DataFrame
tsv2 = ScoutPython.df2tsv(df)     # pandas DataFrame -> TSV

Run a Workflow from Python:

import sys
sys.path.append('python')  # add this repo's python/ on dev checkouts

import scout.workflow as sw

wf = sw.Workflow('Baking')
print(wf.tasks())
step = wf.fork('bake_muffin_tray', add_blueberries=True, clean='recursive')
step.join()
print(step.load())         # load Ruby job result

Core concepts

Path management for Python imports

ScoutPython tracks Python directories to add to sys.path:

  • ScoutPython.add_path(path) / add_paths(paths)
  • ScoutPython.process_paths # idempotent; run before/inside sessions

These are applied in Python contexts by run/run_simple/run_direct.

Running Python from Ruby

Pick the execution model that fits:

  • run(mod = nil, imports = nil) { ... }
    • Initialize PyCall if needed, set up paths, run block; GC after run
  • run_simple(mod = nil, imports = nil) { ... }
    • Lightweight; process_paths, then run block
  • run_direct(mod = nil, imports = nil) { ... }
    • Minimal overhead: optional single pyimport/pyfrom, then evaluate
  • run_threaded(mod = nil, imports = nil) { ... }
    • Queue work into a dedicated Python thread; stop with stop_thread

Logging wrappers capture Python’s stdout/stderr via the Scout Log:

  • run_log(mod=nil, imports=nil, severity=Log::LOW, severity_err=nil) { ... }
  • run_log_stderr(mod=nil, imports=nil, severity=Log::LOW) { ... }

Imports

  • Pass 'numpy', as: :np or "module.submodule", import: [:Class, :func]

Binding scopes and imports

Keep imports local to a binding:

ScoutPython.binding_run do
  pyimport :torch
  pyfrom :torch, import: ['nn']
  # torch and nn available here only
end

Helpers

  • new_binding, binding_run
  • import_method, call_method
  • get_module, get_class, class_new_obj
  • exec(script) → PyCall.exec

Scripting

Run arbitrary Python text with Ruby variables injected:

  • ScoutPython.script(text, variables = {}) → result
    • Ruby primitives → Python literals
    • Arrays/Hashes → recursively converted
    • TSV variables → materialized to temp file and loaded into pandas via the python/scout helper
    • result is read back via pickle (default) or JSON (configurable)

Swap result serializer if desired:

class << ScoutPython
  alias save_script_result save_script_result_json
  alias load_result        load_json
end

Iteration utilities

Traverse Python iterables with optional progress bars:

  • iterate(iterator, bar: nil|true|String) { |elem| ... }
  • iterate_index(sequence, bar: ...) { |elem| ... }
  • collect(iterator, bar: ...) { |elem| ... } → Array

Data conversion and pandas helpers

  • numpy2ruby(numpy_array)
  • to_a/py2ruby_a(py_list)
  • obj2hash(py_mapping)
  • tsv2df(tsv) / df2tsv(df, options={type: :list, key_field: ...})

Python-side package (python/scout)

The included Python package is importable as scout and provides:

General utilities

  • scout.libdir(), scout.add_libdir()
  • scout.path(), scout.read()
  • scout.inspect(obj), scout.rich(obj)

TSV IO (pandas-aware)

  • scout.tsv(tsv_path_or_stream, ...) → pandas.DataFrame (Scout headers respected)
  • scout.save_tsv(filename, df, key=None)

Workflow wrappers

  • scout.run_job(workflow, task, name='Default', fork=False, clean=False, **inputs)
    • Shells out to the Ruby CLI to execute/fork jobs
  • scout.workflow.Workflow(name).run/fork/tasks/task_info
  • scout.workflow.Step(path).info/status/join/load

Remote workflows (HTTP)

  • scout.workflow.remote.RemoteWorkflow(url).job/task_info
  • scout.workflow.remote.RemoteStep(url).status/wait/raw/json

Error handling and threading

  • Python process errors from script are surfaced as ConcurrentStreamProcessFailed (non‑zero exit), with stderr logged via Log if a logging wrapper is used
  • Background thread execution must be stopped explicitly:
    • ScoutPython.stop_thread — sends a sentinel, tries to join/kill, GCs, and finalizes PyCall if available

Command line usage and discovery

Scout commands are discovered under scout_commands across installed packages using the Path subsystem. The dispatcher resolves nested commands by adding terms until a file is found to execute; if you stop on a directory, it lists available subcommands.

  • General pattern:
    • scout [ ...] [options] [args...]
  • Examples relevant to Python integration (executed from Ruby CLI but callable from Python via scout.run_job):
    • scout workflow task [task-input-options...]
    • scout workflow prov <step_path>
    • scout workflow info <step_path>

Notes

  • The bin/scout launcher walks scout_commands/… across packages; Workflows and other packages can add their own commands and they will be discovered
  • See the Workflow, TSV, and KnowledgeBase docs for their CLI suites:
    • TSV: scout tsv …
    • Workflow: scout workflow …
    • KnowledgeBase: scout kb …

scout-rig itself does not register standalone CLI commands; instead, its Python wrapper invokes the existing Ruby CLI to run jobs from Python.


Reference

Read the full module guide in doc/Python.md. For core building blocks referenced above, see these docs in scout-essentials and scout-gear:

  • Annotation.md, CMD.md, ConcurrentStream.md, IndiferentHash.md, Log.md, Open.md, Path.md, Persist.md, TmpFile.md
  • TSV.md, Workflow.md, KnowledgeBase.md, Association.md, Entity.md, WorkQueue.md, Semaphore.md

Examples

Direct PyCall with imports:

ScoutPython.run 'numpy', as: :np do
  a = np.array([1,2,3])
  a.sum            # PyObject; convert with to_i if needed
end

Script with a returned value and TSV round‑trip:

tsv = TSV.setup({}, "Key~ValueA,ValueB#:type=:list")
tsv["k1"] = ["a1", "b1"]; tsv["k2"] = ["a2", "b2"]

TmpFile.with_file do |target|
  result = ScoutPython.script <<~PY, df: tsv, target: target
    import scout
    result = df.loc["k2", "ValueB"]
    scout.save_tsv(target, df)
  PY
  # result == "b2"; target contains the saved TSV
end

numpy conversion:

ra = ScoutPython.run :numpy, as: :np do
  na = np.array([[[1,2,3], [4,5,6]]])
  ScoutPython.numpy2ruby(na)
end
ra[0][1][2] # => 6

Run workflows from Python:

import scout.workflow as sw

wf = sw.Workflow('Baking')
step = wf.fork('bake_muffin_tray', add_blueberries=True, clean='recursive')
step.join()
print(step.load())

Project links

Contributions and issues are welcome in their respective GitHub repositories.