Broadlistening
A Ruby implementation of the Kouchou-AI Broadlistening pipeline. Clusters and analyzes public comments using LLM.
Overview
Broadlistening is a pipeline for analyzing large volumes of comments and opinions using AI. It processes data through the following steps:
- Extraction - Extract key opinions from comments using LLM
- Embedding - Vectorize extracted opinions
- Clustering - UMAP + KMeans + hierarchical clustering
- Initial Labelling - Label each cluster using LLM
- Merge Labelling - Hierarchically integrate labels
- Overview - Generate overall summary using LLM
- Aggregation - Output results in JSON format
Installation
Add to your Gemfile:
gem 'broadlistening'Then run:
bundle installUsage
Command Line Interface
After installation, you can use the broadlistening command:
broadlistening config.json [options]Options:
-
-f, --force- Force re-run all steps regardless of previous execution -
-o, --only STEP- Run only the specified step (extraction, embedding, clustering, etc.) -
--skip-interaction- Skip the interactive confirmation prompt -
-h, --help- Show help message -
-v, --version- Show version
Example config.json:
{
"input": "comments.csv",
"question": "What are the main opinions?",
"api_key": "sk-...",
"model": "gpt-4o-mini",
"cluster_nums": [5, 15]
}Input CSV format:
comment-id,comment-body
1,We need environmental measures
2,I hope for better public transportation
Example:
# Run the full pipeline
broadlistening config.json
# Force re-run all steps
broadlistening config.json --force
# Run only the extraction step
broadlistening config.json --only extraction
# Run without confirmation prompt
broadlistening config.json --skip-interactionRuby API
require 'broadlistening'
# Prepare comment data
comments = [
{ id: "1", body: "We need environmental measures", proposal_id: "123" },
{ id: "2", body: "I hope for better public transportation", proposal_id: "123" },
# ...
]
# Run the pipeline
pipeline = Broadlistening::Pipeline.new(
api_key: ENV['OPENAI_API_KEY'],
model: "gpt-4o-mini",
cluster_nums: [5, 15]
)
result = pipeline.run(comments, output_dir: "./output")
# Get results
puts result[:overview]
puts result[:clusters]Rails Example
# app/jobs/analysis_job.rb
class AnalysisJob < ApplicationJob
queue_as :analysis
def perform(proposal_id)
proposal = Proposal.find(proposal_id)
comments = proposal.comments.map do |c|
{ id: c.id, body: c.body, proposal_id: c.proposal_id }
end
pipeline = Broadlistening::Pipeline.new(
api_key: ENV['OPENAI_API_KEY'],
model: "gpt-4o-mini",
cluster_nums: [5, 15]
)
result = pipeline.run(comments, output_dir: "./output")
proposal.create_analysis_result!(
result_data: result,
comment_count: comments.size
)
end
endConfiguration Options
Broadlistening::Pipeline.new(
api_key: "your-api-key", # OpenAI API key (required)
model: "gpt-4o-mini", # LLM model (default: gpt-4o-mini)
embedding_model: "text-embedding-3-small", # Embedding model
cluster_nums: [5, 15], # Cluster hierarchy levels (default: [5, 15])
workers: 10, # Number of parallel workers
prompts: { # Custom prompts (optional)
extraction: "...",
initial_labelling: "...",
merge_labelling: "...",
overview: "..."
}
)Using Local LLM
If you want to use a local LLM on a machine with GPU, follow these steps:
- Install and start Ollama
- Download the required models:
ollama pull llama3 ollama pull nomic-embed-text
- Use
provider: :localin Ruby:config = Broadlistening::Config.new( provider: :local, model: "llama3", embedding_model: "nomic-embed-text", local_llm_address: "localhost:11434", cluster_nums: [5, 15] ) pipeline = Broadlistening::Pipeline.new(config) result = pipeline.run(comments, output_dir: "./output")
Note:
- Using local LLM requires sufficient GPU memory (8GB or more recommended)
- Model downloads may take time on first startup
Output Format
The pipeline result is a Hash with the following structure:
{
arguments: [
{
arg_id: "A1_0",
argument: "Environmental measures are needed",
x: 0.5, # UMAP X coordinate
y: 0.3, # UMAP Y coordinate
cluster_ids: ["0", "1_0", "2_3"] # Cluster IDs
},
# ...
],
clusters: [
{
level: 0,
id: "0",
label: "All",
description: "",
count: 100,
parent: nil
},
{
level: 1,
id: "1_0",
label: "Environment & Energy",
description: "Opinions on environmental issues and energy policy",
count: 25,
parent: "0"
},
# ...
],
relations: [
{ arg_id: "A1_0", comment_id: "1", proposal_id: "123" },
# ...
],
comment_count: 50,
argument_count: 100,
overview: "Analysis summary text...",
config: { model: "gpt-4o-mini", ... }
}Dependencies
- Ruby >= 3.1.0
- activesupport >= 7.0
- numo-narray ~> 0.9
- ruby-openai ~> 7.0
- parallel ~> 1.20
- rice ~> 4.6.0
- umappp ~> 0.2
Installing umappp
umappp includes C++ native extensions and requires a C++ compiler:
# macOS
CXX=clang++ gem install umappp
# Linux
gem install umapppNote: Use Rice 4.6.x due to compatibility issues with Rice 4.7.x.
Development
# Setup
bin/setup
# Run tests
bundle exec rspec
# Console
bin/consoleLicense
AGPL 3.0