Broadlistening
A Ruby implementation of the Kouchou-AI Broadlistening pipeline. Clusters and analyzes public comments using LLM.
Overview
Broadlistening is a pipeline for analyzing large volumes of comments and opinions using AI. It processes data through the following steps:
- Extraction - Extract key opinions from comments using LLM
- Embedding - Vectorize extracted opinions
- Clustering - UMAP + KMeans + hierarchical clustering
- Initial Labelling - Label each cluster using LLM
- Merge Labelling - Hierarchically integrate labels
- Overview - Generate overall summary using LLM
- Aggregation - Output results in JSON format
Results can be visualized as an interactive HTML report using broadlistening-html.
Installation
Add to your Gemfile:
gem 'broadlistening'Then run:
bundle installUsage
Command Line Interface
After installation, you can use the broadlistening command:
broadlistening config.json [options]Options
| Option | Description |
|---|---|
-f, --force |
Force re-run all steps regardless of previous execution |
-o, --only STEP |
Run only the specified step (extraction, embedding, clustering, etc.) |
--from STEP |
Resume pipeline from specified step |
--input-dir DIR |
Use different input directory for resuming (requires --from) |
-i, --input FILE |
Input file path (CSV or JSON) - overrides config |
-n, --dry-run |
Show what would be executed without actually running |
-V, --verbose |
Show detailed output including step parameters and LLM usage |
-h, --help |
Show help message |
-v, --version |
Show version |
Example config.json
{
"input": "comments.csv",
"question": "What are the main opinions?",
"api_key": "sk-...",
"model": "gpt-4o-mini",
"cluster_nums": [5, 15]
}Input CSV format
comment-id,comment-body
1,We need environmental measures
2,I hope for better public transportation
Example
broadlistening config.json # Run full pipeline
broadlistening config.json --dry-run # Preview without running
broadlistening config.json --from clustering # Resume from step
broadlistening config.json --input comments.csv # Override input fileHTML Report Generator
Generate a standalone HTML file from pipeline results for previewing and sharing. The report displays clusters, subclusters, and extracted opinions in an interactive format.
broadlistening-html outputs/report/hierarchical_result.json # Generate report
broadlistening-html outputs/report/hierarchical_result.json --help # Show optionsLibrary Usage
Ruby API
require 'broadlistening'
# Prepare comment data
comments = [
{ id: "1", body: "We need environmental measures", proposal_id: "123" },
{ id: "2", body: "I hope for better public transportation", proposal_id: "123" },
# ...
]
# Run the pipeline
pipeline = Broadlistening::Pipeline.new(
api_key: ENV['OPENAI_API_KEY'],
model: "gpt-4o-mini",
cluster_nums: [5, 15]
)
result = pipeline.run(comments, output_dir: "./output")
# Get results
puts result[:overview]
puts result[:clusters]Configuration Options
Broadlistening::Pipeline.new(
api_key: "...", # Omit to use env var (OPENAI_API_KEY, GEMINI_API_KEY, etc.)
model: "gpt-4o-mini", # LLM model (default: gpt-4o-mini)
embedding_model: "text-embedding-3-small", # Embedding model
cluster_nums: [5, 15], # Cluster hierarchy levels (default: [5, 15])
workers: 10, # Number of parallel workers
prompts: { extraction: "...", ... } # Custom prompts (optional)
)Using Local LLM (Ollama)
config = Broadlistening::Config.new(
provider: :local,
model: "llama3",
embedding_model: "nomic-embed-text",
local_llm_address: "localhost:11434"
)Output Format
The pipeline outputs hierarchical_result.json containing:
-
arguments- Extracted opinions with UMAP coordinates and cluster assignments -
clusters- Hierarchical cluster structure with labels -
overview- LLM-generated summary -
config- Pipeline configuration used
Development
# Setup
bin/setup
# Run tests
bundle exec rspec
# Console
bin/consoleLicense
Copyright (C) 2025 Masayoshi Takahashi
This repository is licensed under the GNU Affero General Public License v3.0. Unless otherwise noted, all files in this repository are covered by this license. See the LICENSE file for details.