The project is in a healthy, maintained state
Advanced Ruby gem for legal document analysis featuring AI-powered plain language generation, multilingual processing (8 languages), PDF annotations, model training, risk analysis, and clause detection. Supports PDF, DOCX, RTF formats with comprehensive CLI tools.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

>= 1.17
~> 13.0
~> 3.12
~> 1.50
~> 0.9

Runtime

~> 0.8
~> 2.6
~> 1.13
~> 2.11
~> 1.2
 Project Readme

Legal Summariser 📋⚖️

A Ruby-based AI-powered toolkit for legal document analysis that summarises contracts, extracts key clauses, flags risks, and translates legal jargon into plain English while preserving legal accuracy.

Created by Ahmet KAHRAMAN 👨‍💻

Ruby License Tests Version


👋 About the Author

Ahmet KAHRAMAN - Mobile Developer & Cyber Security Expert

"Security first, innovation always" - Building secure, innovative solutions for a better digital future 🚀


Features

🚀 Core Analysis

  • Document Processing: Supports PDF, DOCX, RTF, and plain text files
  • Smart Summarisation: Converts legal documents into concise plain English
  • Clause Detection: Automatically identifies key legal clauses including:
    • Data Processing & Privacy (GDPR/KVKK compliance)
    • Liability & Indemnification
    • Confidentiality & Non-disclosure
    • Termination & Cancellation
    • Payment & Fees
    • Intellectual Property
    • Dispute Resolution
    • Governing Law
  • Risk Analysis: Flags potential legal risks and unfair terms
  • Compliance Checking: Identifies gaps in GDPR, KVKK, and other regulations

🤖 AI/ML Features (v0.3.0)

  • Plain Language Generator: AI-powered legal text simplification with 30+ jargon mappings
  • Model Training System: Train custom legal language models (pattern-based, statistical, neural)
  • Readability Scoring: Calculate complexity reduction and readability metrics
  • Fine-tuning Support: Customize models for specific legal domains

🌍 Multilingual Support

  • 8 Languages Supported: English, Spanish, French, German, Italian, Portuguese, Turkish, Dutch
  • Legal Term Translation: Cross-language legal terminology mapping
  • Cultural Adaptations: Legal system-specific processing for different countries
  • AI Translation Integration: Support for external translation APIs

📄 Advanced Output

  • PDF Annotations: Rich PDF output with highlighting, comments, and risk indicators
  • Multiple Formats: JSON, Markdown, plain text, and annotated PDF
  • Batch Processing: Process multiple documents simultaneously
  • Performance Monitoring: Built-in metrics and caching system

🛠️ Developer Tools

  • CLI Interface: Comprehensive command-line tool
  • Configuration System: Flexible configuration with validation
  • Caching System: Result caching with TTL and size management
  • Offline Processing: Works without internet for sensitive documents

Installation

Add this line to your application's Gemfile:

gem 'legal_summariser'

And then execute:

bundle install

Or install it yourself as:

gem install legal_summariser

Usage

Ruby API

require "legal_summariser"

# Basic usage
summary = LegalSummariser.summarise("contracts/nda.pdf")
puts summary[:plain_text]        # AI-generated plain language version
puts summary[:summary]           # Original summary
puts summary[:multilingual]      # Multi-language processing results

# Advanced AI features
result = LegalSummariser.summarise("contract.pdf", {
  format: 'markdown',
  max_sentences: 3,
  language: 'es',              # Process in Spanish
  plain_language: true,        # Enable AI plain language generation
  generate_annotations: true   # Create PDF annotations
})

# Access AI-enhanced analysis
puts result[:key_points]        # Key contract points
puts result[:clauses]           # Detected legal clauses
puts result[:risks]             # Risk analysis
puts result[:plain_text]        # AI-simplified version
puts result[:multilingual]      # Multi-language results
puts result[:metadata]          # Enhanced metadata with AI metrics

# Plain Language Generator
generator = LegalSummariser::PlainLanguageGenerator.new
plain_result = generator.generate("The party of the first part shall indemnify...")
puts plain_result[:text]        # "The first party will compensate..."
puts plain_result[:readability_score]  # Readability improvement metrics

# Model Training
trainer = LegalSummariser::ModelTrainer.new
trainer.train_model('contract_model', training_data, type: 'statistical')
trainer.fine_tune_model('contract_model', fine_tuning_data)

# Multilingual Processing
processor = LegalSummariser::MultilingualProcessor.new
result = processor.process_multilingual("contract.pdf", source: 'en', target: 'es')

# PDF Annotations
annotator = LegalSummariser::PDFAnnotator.new
annotator.create_annotated_pdf("contract.pdf", analysis_results, "annotated_contract.pdf")

Command Line Interface

# Basic analysis
legal_summariser analyze contract.pdf

# AI-enhanced analysis with plain language
legal_summariser analyze contract.pdf --plain-language --format markdown

# Multilingual processing
legal_summariser analyze contract.pdf --language es --translate-to en

# Generate annotated PDF
legal_summariser analyze contract.pdf --annotate --output annotated_contract.pdf

# Batch processing with AI features
legal_summariser batch contracts/ --plain-language --multilingual

# Configuration and stats
legal_summariser config --set language=es
legal_summariser stats

# Model management
legal_summariser train-model --type statistical --data training_data.json
legal_summariser list-models

# Utility commands
legal_summariser demo
legal_summariser supported_formats
legal_summariser version

Example Output

Plain Text Summary

This Non-Disclosure Agreement establishes confidentiality obligations between parties. 
The agreement will remain valid for 2 years from the date of signing. Either party may 
terminate with 30 days written notice. The receiving party will be liable for any 
breach of confidentiality obligations.

Risk Analysis

High Risks Found:
- Unlimited Liability: Agreement may expose party to unlimited financial liability
- Broad Indemnification: Very broad indemnification obligations that could be costly

Compliance Gaps:
- Missing GDPR Reference: Document processes personal data but lacks GDPR compliance language
- Missing Data Subject Rights: No mention of data subject rights under GDPR

Detected Clauses

  • Confidentiality: 3 clauses found
  • Liability: 2 clauses found
  • Termination: 1 clause found
  • Data Processing: 2 clauses found

Supported Document Types

The system automatically detects and optimizes analysis for:

  • Non-Disclosure Agreements (NDAs)
  • Service Agreements
  • Employment Contracts
  • Privacy Policies
  • License Agreements
  • General Contracts

Supported File Formats

Input Formats

  • PDF (.pdf)
  • Microsoft Word (.docx)
  • Plain Text (.txt)

Output Formats

  • JSON (structured data)
  • Markdown (formatted report)
  • Plain Text (simple summary)

Target Users

  • Law firms & compliance teams: Faster contract reviews
  • Startups & SMEs: Understanding investor or supplier contracts
  • Forensics experts: Extracting critical legal clauses for reports
  • Academics & NGOs: Analysing legal policies and regulations

Technical Architecture

Hybrid Approach

  • Rule-based extractors: For structured clause detection
  • NLP processing: For summarisation and risk detection
  • Pattern matching: For compliance gap identification

Key Components

  • TextExtractor: Multi-format document parsing (PDF, DOCX, RTF, TXT)
  • Summariser: Enhanced plain English conversion engine
  • ClauseDetector: Advanced legal clause identification
  • RiskAnalyzer: Comprehensive risk assessment and flagging
  • PlainLanguageGenerator: AI-powered legal text simplification
  • ModelTrainer: Custom model training and fine-tuning system
  • MultilingualProcessor: Cross-language processing and translation
  • PDFAnnotator: Rich PDF annotation and highlighting
  • Formatter: Multi-format output generation (JSON, Markdown, PDF)
  • Cache & Performance: Advanced caching and performance monitoring

Development

After checking out the repo, run bundle install to install dependencies. Then, run rake spec to run the tests.

# Install dependencies
bundle install

# Run tests
bundle exec rspec

# Run linter
bundle exec rubocop

# Build gem
gem build legal_summariser.gemspec

# Install local gem
gem install ./legal_summariser-*.gem

Roadmap

  • v0.1 ✅ Text extraction + basic summarisation
  • v0.2 ✅ Clause detection + risk flagging + performance enhancements
  • v0.3 ✅ AI/ML features + multilingual support + PDF annotations
  • v1.0 📋 Advanced neural models + enterprise features + API service

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/legal-summariser/legal_summariser.

License

The gem is available as open source under the terms of the MIT License.

Disclaimer

This tool is designed to assist with legal document analysis but should not replace professional legal advice. Always consult with qualified legal professionals for important legal matters.

🌍 Global Impact

  • Innovation: Bridges AI with law and compliance in a developer-friendly way
  • Contribution: Open-source library for legal NLP in Ruby
  • Public benefit: Helps both professionals and citizens better understand their rights and obligations
  • Global relevance: Applicable across jurisdictions (GDPR, KVKK, HIPAA, CCPA)

🛠️ Tech Stack

This project leverages my expertise in:

  • Ruby Development: Gem architecture, modular design patterns
  • AI & NLP: Advanced machine learning, neural networks, multilingual processing
  • Cybersecurity: Compliance frameworks (GDPR, KVKK), risk assessment
  • Digital Forensics: Legal document analysis, evidence extraction
  • Software Engineering: Test-driven development, CLI tools

🎓 Professional Background

As a Mobile Developer & Cyber Security Expert with 10+ years in Public Sector IT:

  • 🎓 Master's in Forensic Informatics - Gazi University (2021-2023)
  • 🏢 Mobile Developer - Gendarmerie General Command (2024-Present)
  • 🔒 Certified Ethical Hacker (CEH)
  • 📱 iOS & Android Development Expert

🤝 Connect & Collaborate

Platform Link Purpose
🌐 Portfolio ahmetxhero.web.app Professional showcase
🎥 YouTube @ahmetxhero Tech tutorials & content
💼 LinkedIn ahmetxhero Professional network
🐤 Twitter @ahmetxhero Tech updates & thoughts
📝 Medium ahmetxhero.medium.com Technical articles
📷 Instagram @ahmetxhero Behind the scenes

☕ Support My Work

If this project helps you or your organization, consider:

  • Star this repository on GitHub
  • 🔄 Share it with your network
  • Buy me a coffee to support open source development
  • 🗣️ Invite me to speak at your tech event about legal tech innovation

🎯 Current Focus

  • 🔒 Cybersecurity: Developing secure applications with privacy-first approach
  • 📱 Mobile Development: Creating innovative iOS and Android solutions
  • 🔍 Digital Forensics: Advancing forensic investigation techniques
  • 📚 Knowledge Sharing: Contributing to the tech community through open source
  • ⚖️ Legal Tech: Building tools that make legal processes more accessible

Building secure, innovative solutions for a better digital future 🚀