scout-rig
scout-rig provides the language interop “rigging” for the Scout ecosystem. It currently focuses on Python: executing Python from Ruby, round‑tripping data (TSV ↔ pandas), and running Scout Workflows from Python code. It builds on the low-level/core packages:
- scout-essentials — low level utilities (Annotation, CMD, ConcurrentStream, IndiferentHash, Log, Open, Path, Persist, TmpFile)
- scout-gear — data and workflow primitives (TSV, Workflow, KnowledgeBase, Association, Entity, WorkQueue, Semaphore)
- scout-rig — interop with other languages (currently Python)
- scout-camp — remote servers, cloud deployments, web interfaces, cross-site operations
- scout-ai — model training and agentic tools
All packages are available on GitHub under https://github.com/mikisvaz (for example, https://github.com/mikisvaz/scout-gear).
For broader background and many real workflow examples, see Rbbt (the bioinformatics framework from which Scout was refactored) and the Rbbt-Workflows organization:
This README focuses on the Python bridge in scout-rig (ScoutPython). See the docs in doc/ for reference material.
- doc/Python.md — ScoutPython user guide
What you get
ScoutPython (Ruby) and a companion Python package (python/scout) provide:
- Safe, ergonomic execution of Python code from Ruby (PyCall-based), with:
- Simple import helpers and localized bindings
- Synchronous, direct, or background-thread execution
- Logging wrappers that capture Python stdout/stderr
- Scripting to run ad‑hoc Python text with Ruby variables (including TSV) injected, and results returned
- Data conversion helpers:
- numpy arrays → Ruby Arrays
- pandas DataFrame ↔ TSV (key_field, fields, type respected)
- Python path management (expose package python/ dirs to sys.path)
- Python‑side helpers to:
- Read/write TSVs with headers (pandas)
- Run Ruby Workflows from Python
- Call remote Workflow services over HTTP
Installation and requirements
Ruby
- Ruby 2.6+ (or compatible with PyCall)
- Gems:
- pycall (PyCall)
- json (standard)
- Optional for script result loading:
- python/pickle (gem) for loading pickle from Python scripts
Python
- Python 3
- Packages:
- pandas
- numpy
- requests (only for remote workflow client)
- Ensure python3 is in PATH
Add scout-rig to your Ruby project (Gemfile or local checkout), then ensure Python dependencies are installed in your Python environment.
Quick start
Execute Python directly from Ruby:
require 'scout_python'
# Sum with numpy
arr_sum = ScoutPython.run 'numpy', as: :np do
np.array([1,2,3]).sum
end
# => PyObject (to_i if needed)
# Background thread execution
ScoutPython.run_threaded :sys do
sys.path.append('/opt/my_py_pkg')
end
ScoutPython.stop_thread
Run an ad‑hoc Python script, returning a result value:
tsv = TSV.setup({}, "Key~ValueA,ValueB#:type=:list")
tsv["k1"] = %w[a1 b1]; tsv["k2"] = %w[a2 b2]
TmpFile.with_file do |target|
result = ScoutPython.script <<~PY, df: tsv, target: target
import scout
# df is a pandas DataFrame (tsv injected)
result = df.loc["k2", "ValueB"]
scout.save_tsv(target, df) # save as TSV with header
PY
# result is "b2"; target holds a TSV round-tripped from pandas
end
Convert between TSV and pandas:
df = ScoutPython.tsv2df(tsv) # TSV -> pandas DataFrame
tsv2 = ScoutPython.df2tsv(df) # pandas DataFrame -> TSV
Run a Workflow from Python:
import sys
sys.path.append('python') # add this repo's python/ on dev checkouts
import scout.workflow as sw
wf = sw.Workflow('Baking')
print(wf.tasks())
step = wf.fork('bake_muffin_tray', add_blueberries=True, clean='recursive')
step.join()
print(step.load()) # load Ruby job result
Core concepts
Path management for Python imports
ScoutPython tracks Python directories to add to sys.path:
- ScoutPython.add_path(path) / add_paths(paths)
- ScoutPython.process_paths # idempotent; run before/inside sessions
These are applied in Python contexts by run/run_simple/run_direct.
Running Python from Ruby
Pick the execution model that fits:
- run(mod = nil, imports = nil) { ... }
- Initialize PyCall if needed, set up paths, run block; GC after run
- run_simple(mod = nil, imports = nil) { ... }
- Lightweight; process_paths, then run block
- run_direct(mod = nil, imports = nil) { ... }
- Minimal overhead: optional single pyimport/pyfrom, then evaluate
- run_threaded(mod = nil, imports = nil) { ... }
- Queue work into a dedicated Python thread; stop with stop_thread
Logging wrappers capture Python’s stdout/stderr via the Scout Log:
- run_log(mod=nil, imports=nil, severity=Log::LOW, severity_err=nil) { ... }
- run_log_stderr(mod=nil, imports=nil, severity=Log::LOW) { ... }
Imports
- Pass 'numpy', as: :np or "module.submodule", import: [:Class, :func]
Binding scopes and imports
Keep imports local to a binding:
ScoutPython.binding_run do
pyimport :torch
pyfrom :torch, import: ['nn']
# torch and nn available here only
end
Helpers
- new_binding, binding_run
- import_method, call_method
- get_module, get_class, class_new_obj
- exec(script) → PyCall.exec
Scripting
Run arbitrary Python text with Ruby variables injected:
- ScoutPython.script(text, variables = {}) → result
- Ruby primitives → Python literals
- Arrays/Hashes → recursively converted
- TSV variables → materialized to temp file and loaded into pandas via the python/scout helper
- result is read back via pickle (default) or JSON (configurable)
Swap result serializer if desired:
class << ScoutPython
alias save_script_result save_script_result_json
alias load_result load_json
end
Iteration utilities
Traverse Python iterables with optional progress bars:
- iterate(iterator, bar: nil|true|String) { |elem| ... }
- iterate_index(sequence, bar: ...) { |elem| ... }
- collect(iterator, bar: ...) { |elem| ... } → Array
Data conversion and pandas helpers
- numpy2ruby(numpy_array)
- to_a/py2ruby_a(py_list)
- obj2hash(py_mapping)
- tsv2df(tsv) / df2tsv(df, options={type: :list, key_field: ...})
Python-side package (python/scout)
The included Python package is importable as scout and provides:
General utilities
- scout.libdir(), scout.add_libdir()
- scout.path(), scout.read()
- scout.inspect(obj), scout.rich(obj)
TSV IO (pandas-aware)
- scout.tsv(tsv_path_or_stream, ...) → pandas.DataFrame (Scout headers respected)
- scout.save_tsv(filename, df, key=None)
Workflow wrappers
- scout.run_job(workflow, task, name='Default', fork=False, clean=False, **inputs)
- Shells out to the Ruby CLI to execute/fork jobs
- scout.workflow.Workflow(name).run/fork/tasks/task_info
- scout.workflow.Step(path).info/status/join/load
Remote workflows (HTTP)
- scout.workflow.remote.RemoteWorkflow(url).job/task_info
- scout.workflow.remote.RemoteStep(url).status/wait/raw/json
Error handling and threading
- Python process errors from script are surfaced as ConcurrentStreamProcessFailed (non‑zero exit), with stderr logged via Log if a logging wrapper is used
- Background thread execution must be stopped explicitly:
- ScoutPython.stop_thread — sends a sentinel, tries to join/kill, GCs, and finalizes PyCall if available
Command line usage and discovery
Scout commands are discovered under scout_commands across installed packages using the Path subsystem. The dispatcher resolves nested commands by adding terms until a file is found to execute; if you stop on a directory, it lists available subcommands.
- General pattern:
- scout [ ...] [options] [args...]
- Examples relevant to Python integration (executed from Ruby CLI but callable from Python via scout.run_job):
- scout workflow task [task-input-options...]
- scout workflow prov <step_path>
- scout workflow info <step_path>
Notes
- The bin/scout launcher walks scout_commands/… across packages; Workflows and other packages can add their own commands and they will be discovered
- See the Workflow, TSV, and KnowledgeBase docs for their CLI suites:
- TSV: scout tsv …
- Workflow: scout workflow …
- KnowledgeBase: scout kb …
scout-rig itself does not register standalone CLI commands; instead, its Python wrapper invokes the existing Ruby CLI to run jobs from Python.
Reference
Read the full module guide in doc/Python.md. For core building blocks referenced above, see these docs in scout-essentials and scout-gear:
- Annotation.md, CMD.md, ConcurrentStream.md, IndiferentHash.md, Log.md, Open.md, Path.md, Persist.md, TmpFile.md
- TSV.md, Workflow.md, KnowledgeBase.md, Association.md, Entity.md, WorkQueue.md, Semaphore.md
Examples
Direct PyCall with imports:
ScoutPython.run 'numpy', as: :np do
a = np.array([1,2,3])
a.sum # PyObject; convert with to_i if needed
end
Script with a returned value and TSV round‑trip:
tsv = TSV.setup({}, "Key~ValueA,ValueB#:type=:list")
tsv["k1"] = ["a1", "b1"]; tsv["k2"] = ["a2", "b2"]
TmpFile.with_file do |target|
result = ScoutPython.script <<~PY, df: tsv, target: target
import scout
result = df.loc["k2", "ValueB"]
scout.save_tsv(target, df)
PY
# result == "b2"; target contains the saved TSV
end
numpy conversion:
ra = ScoutPython.run :numpy, as: :np do
na = np.array([[[1,2,3], [4,5,6]]])
ScoutPython.numpy2ruby(na)
end
ra[0][1][2] # => 6
Run workflows from Python:
import scout.workflow as sw
wf = sw.Workflow('Baking')
step = wf.fork('bake_muffin_tray', add_blueberries=True, clean='recursive')
step.join()
print(step.load())
Project links
- scout-essentials — https://github.com/mikisvaz/scout-essentials
- scout-gear — https://github.com/mikisvaz/scout-gear
- scout-rig — https://github.com/mikisvaz/scout-rig
- scout-camp — https://github.com/mikisvaz/scout-camp
- scout-ai — https://github.com/mikisvaz/scout-ai
- Rbbt — https://github.com/mikisvaz/rbbt
- Rbbt-Workflows — https://github.com/Rbbt-Workflows
Contributions and issues are welcome in their respective GitHub repositories.