Project

omnivore

0.0
No release in over 3 years
A library for extracting content from HTML documents.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Development

~> 2.0.1
~> 2.8.0
~> 0.7.4

Runtime

~> 1.5.0
 Project Readme

Omnivore

Quantitative research and trading platform

Quick Start Guide

This guide walks you through setting up Omnivore and generating your first prediction.

Prerequisites

  • Python 3.13+
  • PostgreSQL 14+
  • Redis 7+

1. Installation

# Clone the repository
git clone git@github.com:matthiase/omnivore.git 
cd omnivore

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

2. Database Setup

# Create the PostgreSQL user and database
sudo -u postgres createuser omnivore --createdb
sudo -u postgres createdb omnivore --owner=omnivore

# Run migrations
psql postgresql://omnivore:@localhost:5432/omnivore -f migrations/001_initial_schema.sql

3. Configuration

Create a .env with your configuration. For example:

OMNIVORE_ENV=development
DATABASE_URL=postgresql://omnivore:@localhost:5432/omnivore
REDIS_URL=redis://localhost:6379/0
MODEL_STORAGE_PATH=./models
FEATURES_CONFIG_PATH=./config/features.json

4. Initialize Data

# Seed initial instruments (SPY, QQQ, IWM, DIA)
python scripts/seed_instruments.py

# Backfill 5 years of historical data and compute features
python scripts/backfill_historical.py

This will take a few minutes as it downloads data from Yahoo Finance.

5. Start Services

You'll need three terminal windows:

Terminal 1 - Redis:

redis-server

Terminal 2 - RQ Worker:

source .venv/bin/activate
rq worker default training

Terminal 3 - Flask API:

source .venv/bin/activate
flask --app src.omnivore.app run --debug

The API is now running at http://localhost:5000.

6. Verify Setup

# Check API health
curl http://localhost:5000/api/health

# List instruments
curl http://localhost:5000/api/instruments

You should see the four seeded ETFs.


Command-Line Interface (CLI)

Omnivore includes a CLI tool for daily operations. This tool allows you to interactively perform key tasks for a selected instrument, with confirmation prompts before any changes are made.

Usage

Activate your virtual environment, then run:

python -m omnivore.cli <subcommand>

Available subcommands:

  • refresh-data — Refresh daily OHLCV data for an instrument
  • record-actuals — Record actual market values for an instrument
  • recompute-features — Recompute features for an instrument
  • generate-predictions — Generate daily predictions for an instrument

Example

python -m omnivore.cli refresh-data

You will be prompted to select an instrument from the list of active instruments. Before any changes are made to the database, the CLI will display a summary of the intended operation and ask for confirmation.

How it works

  • The CLI uses interactive prompts to guide you through selecting an instrument and confirming actions.
  • All operations are performed synchronously and use internal Python functions (no background jobs).
  • Database changes are made within a transaction and only committed after you confirm the summary prompt.

Running Tests

Create a .env.test with your configuration. For example:

OMNIVORE_ENV=test
DATABASE_URL=postgresql://omnivore:@localhost:5432/omnivore_test
REDIS_URL=redis://localhost:6379/1
MODEL_STORAGE_PATH=./models_test
FEATURES_CONFIG_PATH=./config/features.json
OMNIVORE_ENV=test python -m pytest

Creating Your First Model

Step 1: Define a Model

Create an XGBoost model that predicts next-day returns using RSI and moving averages:

curl -X POST http://localhost:5000/api/models \
  -H "Content-Type: application/json" \
  -d '{
    "name": "spy_daily_xgb_v1",
    "description": "XGBoost model for SPY next-day return prediction",
    "target": "return_1d",
    "model_type": "xgboost",
    "feature_config": ["rsi_14", "ma_10", "ma_20", "ma_50", "atr_14"],
    "hyperparameters": {
      "n_estimators": 100,
      "max_depth": 4,
      "learning_rate": 0.1
    }
  }'

Response:

{
  "id": 1,
  "name": "spy_daily_xgb_v1",
  "target": "return_1d",
  "model_type": "xgboost",
  ...
}

Step 2: Train the Model

Train using the last 3 years of SPY data (instrument_id=1):

curl -X POST http://localhost:5000/api/models/1/train \
  -H "Content-Type: application/json" \
  -d '{
    "instrument_id": 1,
    "training_start": "2022-01-01",
    "training_end": "2024-12-31",
    "test_size": 0.2
  }'

Response:

{
  "job_id": "abc123...",
  "status": "queued"
}

Step 3: Check Training Status

Poll the job until it completes:

curl http://localhost:5000/api/jobs/abc123...

When finished:

{
  "job_id": "abc123...",
  "status": "finished",
  "result": {
    "model_id": 1,
    "version": 1,
    "metrics": {
      "test": {
        "rmse": 0.0142,
        "mae": 0.0098,
        "r2": 0.023,
        "directional_accuracy": 0.54
      }
    }
  }
}

Step 4: Activate the Model Version

curl -X POST http://localhost:5000/api/models/1/versions/1/activate

Step 5: Generate a Prediction

Generate a prediction for today:

curl -X POST http://localhost:5000/api/predictions/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": 1,
    "instrument_ids": [1],
    "horizons": ["1d"]
  }'

Check the job result to see the prediction:

{
  "prediction": {
    "id": 1,
    "predicted_value": 0.0023,
    "target_date": "2025-01-02",
    "horizon": "1d"
  },
  "direction": "up",
  "predicted_return": 0.0023
}

Creating a 5-Day Model (Weekly Predictions)

For Saturday predictions of the upcoming week:

curl -X POST http://localhost:5000/api/models \
  -H "Content-Type: application/json" \
  -d '{
    "name": "spy_weekly_lgbm_v1",
    "description": "LightGBM model for SPY 5-day return prediction",
    "target": "return_5d",
    "model_type": "lightgbm",
    "feature_config": ["rsi_14", "ma_10", "ma_20", "ma_50", "atr_14", "ma_cross_10_50"],
    "hyperparameters": {
      "n_estimators": 200,
      "max_depth": 5,
      "learning_rate": 0.05
    }
  }'

Then train and activate as shown above.


Comparing Models

To compare different model versions or approaches:

# List all versions of a model
curl http://localhost:5000/api/models/1/versions

# Compare metrics across versions
curl http://localhost:5000/api/models/1/compare

Response:

[
  {
    "version": 1,
    "is_active": false,
    "test_rmse": 0.0142,
    "test_dir_acc": 0.54
  },
  {
    "version": 2,
    "is_active": true,
    "test_rmse": 0.0138,
    "test_dir_acc": 0.56
  }
]

Tracking Prediction Accuracy

Backfill Actuals

After predictions have resolved (the target date has passed), record actual outcomes:

curl -X POST http://localhost:5000/api/predictions/backfill-actuals \
  -H "Content-Type: application/json" \
  -d '{"instrument_id": 1}'

View Performance Summary

curl "http://localhost:5000/api/predictions/performance?model_id=1&horizon=1d"

Response:

{
  "total_predictions": 45,
  "predictions_with_actuals": 42,
  "mae": 0.0105,
  "directional_accuracy": 0.55,
  "correct_directions": 23,
  "incorrect_directions": 19
}

Monitoring for Drift

Analyze whether features or prediction accuracy have drifted:

# Get model version ID first
curl http://localhost:5000/api/models/1/versions

# Trigger drift analysis (via direct Python for now)
python -c "
from omnivore.jobs import analyze_drift_job
result = analyze_drift_job(
    model_version_id=1,
    instrument_id=1,
    reference_days=90,
    current_days=30
)
print(result)
"

Query drift reports:

curl "http://localhost:5000/api/jobs/drift/reports?model_version_id=1"

Daily Operations

You can perform daily operations using the CLI tool:

# Refresh daily OHLCV data for an instrument
python -m omnivore.cli refresh-data

# Record actual market values for an instrument
python -m omnivore.cli record-actuals

# Recompute features for an instrument
python -m omnivore.cli recompute-features

# Generate daily predictions for an instrument
python -m omnivore.cli generate-predictions

You will be prompted to select the instrument for each operation. The CLI will summarize the changes and ask for confirmation before committing them to the database.

Alternatively, you can use the API endpoints as shown below:

Refresh Data (API)

curl -X POST http://localhost:5000/api/instruments/1/refresh \
  -H "Content-Type: application/json" \
  -d '{}'

Recompute Features (API)

curl -X POST http://localhost:5000/api/instruments/1/features \
  -H "Content-Type: application/json" \
  -d '{}'

Generate Daily Predictions (API)

curl -X POST http://localhost:5000/api/predictions/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": 1,
    "instrument_ids": [1],
    "horizons": ["1d"]
  }'

Suggested Cron Schedule

# Refresh data daily at 6 PM ET (after market close)
0 18 * * 1-5 curl -X POST http://localhost:5000/api/instruments/1/refresh

# Compute features at 6:15 PM ET
15 18 * * 1-5 curl -X POST http://localhost:5000/api/instruments/1/features

# Generate predictions at 6:30 PM ET
30 18 * * 1-5 curl -X POST http://localhost:5000/api/predictions/generate -H "Content-Type: application/json" -d '{"model_id":1,"instrument_ids":[1],"horizons":["1d"]}'

# Saturday 5-day predictions at 10 AM ET
0 10 * * 6 curl -X POST http://localhost:5000/api/predictions/generate -H "Content-Type: application/json" -d '{"model_id":2,"instrument_ids":[1],"horizons":["5d"]}'

# Backfill actuals daily at 7 PM ET
0 19 * * 1-5 curl -X POST http://localhost:5000/api/predictions/backfill-actuals -H "Content-Type: application/json" -d '{"instrument_id":1}'

Troubleshooting

"No features available" error when predicting

Features need to be computed after data is refreshed:

curl -X POST http://localhost:5000/api/instruments/1/features

"Insufficient training data" error

Ensure you have enough historical data. The model requires at least 50 rows after dropping NaN values (indicators need lookback periods):

python scripts/backfill_historical.py

Job stuck in "queued" status

Make sure the RQ worker is running:

rq worker default training

Check RQ queue status

rq info

Next Steps

  1. Experiment with features — Edit config/features.json to add new indicators
  2. Try different models — Create models with ridge, xgboost, or lightgbm
  3. Add more instruments — Use the POST /api/instruments endpoint
  4. Set up alerting — Monitor drift reports for is_alert: true
  5. Automate with cron — Use the suggested schedule above