Retab

The AI Automation Platform

Made with love by the team at Retab 🩷.

Our Website | Documentation | Discord | Twitter

What is Retab?

Retab is the complete developer platform and SDK for shipping state-of-the-art document processing in the age of LLMs.

We want you to use Retab for a defined purpose: get SHIP FAST automations to get STRUCTURED & QUALITY data.

For this mission, we provide the best-in-class preprocessing, help you generate prompts & extraction schemas that fit your preferred model providers, iterate & evaluate the accuracy of your configuration, and ship fast your automation directly in your code.

Why did we build Retab?

Because of a new, lighter paradigm

Large Language Models collapse entire layers of legacy OCR pipelines into a single, elegant abstraction. When a model can read, reason, and structure text natively, we no longer need brittle heuristics, handcrafted parsers, or heavyweight ETL jobs. Instead, we can expose a small, principled API: "give me the document, tell me the schema, and get back structured truth." Complexity evaporates, reliability rises, speed follows, and costs fall—because every component you remove is one that can no longer break.

LLM‑first design lets us focus less on plumbing and more on the questions we actually want answered—Retab stands here. We help you unlock these capabilities, offering you all the software-defined primitives to build your own document processing solutions. We see it as Stripe for document processing.

Check our documentation.

Join our Discord and share your feedback.

API Key

To use the API, you need to sign up on Retab.

SDK

Python:

pip install retab

Node:

npm install @retab/node

Go:

go get github.com/retab-dev/retab/clients/go

Generate a Schema:

from pathlib import Path
from retab import Retab
client = Retab(api_key="YOUR_RETAB_API_KEY")

response = client.schemas.generate(
    documents=["Invoice.pdf"],
    model="retab-small",
)

Extract Data:

import json
from retab import Retab

client = Retab()

response = client.extractions.create(
    json_schema=json.load(open("Invoice_schema.json")),  # json_schema must be a dict
    document="Invoice.pdf",
    model="retab-small",
)

print(response.output)

Projects

On the Platform, Projects provide a systematic way to test and validate your extraction schemas against known ground truth data. Think of it as evals for document AI. You can measure accuracy, compare different models, and optimize your extraction pipelines with confidence.

The project workflow for schema optimization:

Run initial project → identify low-accuracy fields
Refine descriptions and add reasoning prompts → re-run project
Compare accuracy improvements → iterate until satisfied
Deploy optimized schema to production

Projects are configured on the Platform. From the SDK, you drive evaluation runs and schema iteration through the workflows API and its experiments (consensus likelihood) tooling — see the docs for the current surface.

Projects give you an easy-to-use automation engine that's easy to integrate in your codebase and workflows.

Check our documentation.

Community

Let's create the future of document processing together.

Join our Discord to share your journey, discuss best practices, and give your feedback. You can also follow us on X (Twitter) at us.

We can't wait to see how you'll use Retab.

Useful Links

API: Documentation
SDKs: Python, JavaScript, and Go SDK
OpenAI, Google, xAI, Outlines on structured generation
Structured generation Starter Pack
Quickstart
API Reference
Github Repository

retab

Runtime