Project

hlsv

0.0
No release in over 3 years
A Sinatra-based web application for high level SDTM validations.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Runtime

~> 1.19
~> 7.2
~> 2.3
~> 3.4
~> 3.2
~> 4.2
 Project Readme

High Level SDTM Validation — hlsv

A Ruby gem providing a local web application for automated structural checks on SDTM packages, including ASCII validation, define.xml key verification, and natural key discovery.

An open-source SDTM structural validation tool for clinical data teams.

Gem Version Ruby License


🎯 Why this tool exists

Regulatory submissions require SDTM packages to be structurally sound and internally consistent. Manual validation is often time-consuming and prone to human error.

This gem provides:

  • Immediate detection of structural inconsistencies
  • Automated key verification against define.xml
  • Early duplicate detection
  • Rapid quality control before sponsor or regulatory review

All processing is performed locally, ensuring full data confidentiality.


✨ Features

  • ASCII Validation — detects non-ASCII characters in SDTM datasets
  • Define.xml Key Verification — validates keys declared in define.xml
  • Natural Key Discovery — performs ad hoc searches for natural keys across datasets
  • Duplicate Analysis — identifies and reports duplicate records with visual grouping
  • Interactive Web Interface — user-friendly configuration and results viewing
  • Multiple Export Formats — Excel (.xlsx) and CSV outputs
  • Excel Export with README — comprehensive Excel reports with explanatory notes
  • Local-First Processing — all processing occurs on your machine, no data leaves your computer

👤 Who is this for?

  • Clinical Data Managers
  • Biostatisticians
  • Regulatory submission teams
  • CROs preparing SDTM packages
  • Sponsors performing internal QC

🔧 Prerequisites

  • Ruby >= 3.0
  • RubyGems (bundled with Ruby)

All gem dependencies are installed automatically.


📦 Installation

gem install hlsv

That's it. All dependencies (Sinatra, Puma, Excel libraries, etc.) are installed automatically.


🚀 Quick Start

Start the application

hlsv

The application starts on http://localhost:4567 and is accessible only from your machine.

Custom port or host

hlsv --port 8080
hlsv --host 0.0.0.0 --port 8080

First-time setup

  1. Open your browser at http://localhost:4567
  2. Fill in the configuration form:
    • Study Name — unique identifier for your study
    • Output Directory — where duplicate files will be saved
    • Datasets Directory — path to your .xpt files
    • Define.xml Path — path to define.xml (or "-" to skip validation)
    • Key Configuration — variables to test for each dataset type
  3. Click "💾 Save Current Configuration"
  4. Click "🚀 Start Analysis"
  5. Wait for processing to complete
  6. Click the report file to view the full analysis
  7. Click a CSV link to inspect detected duplicates
  8. Refine the Key Configuration if needed and run again
  9. Click "📂 Load Results" to browse all generated reports

⚙️ Configuration

Configuration is managed through the web interface or by editing config.yaml directly in the directory where you launch hlsv.

Parameter reference

Parameter Type Description
study_name string Unique identifier for your study
output_type string Output format: csv (web interface)
output_directory string Directory to save duplicate files
data_directory string Path to your .xpt dataset files
define_path string Path to define.xml; use "-" to skip
excluded_ds string Space-separated datasets to exclude (e.g. "DM SUPPDM")
event_key string Key variables for event datasets (e.g. AE, BE)
intervention_key string Key variables for intervention datasets (e.g. CM, EX)
finding_key string Key variables for finding datasets (e.g. LB, VS)
finding_about_key string Key variables for finding-about datasets (e.g. FA)
ds_key string Key variables for DS dataset
relrec_key string Key variables for RELREC dataset
CO_keyTV_key string Keys for Trial Design datasets (CO, TA, TE, TI, TS, TV)

Example config.yaml

study_name: "MY_STUDY_001"
output_type: "csv"
output_directory: "duplicates"
data_directory: "/path/to/datasets"
define_path: "/path/to/define.xml"
excluded_ds: "DM SUPPDM"

event_key:         "USUBJID AESEQ"
intervention_key:  "USUBJID CMSEQ"
finding_key:       "USUBJID VISITNUM SPEC SEQ"
finding_about_key: "USUBJID FAOBJ FATESTCD"
ds_key:            "USUBJID EPOCH DSDECOD"
relrec_key:        "STUDYID RDOMAIN USUBJID IDVAR IDVARVAL"

CO_key: "STUDYID DOMAIN USUBJID COSEQ"
TA_key: "STUDYID ARMCD EPOCH"
TE_key: "STUDYID ETCD"
TI_key: "STUDYID IETESTCD"
TS_key: "STUDYID TSPARMCD TSSEQ"
TV_key: "STUDYID VISITNUM"

Configuration tips

  • Separate variables with spaces in key configurations
  • Use "-" for define_path to skip define.xml validation
  • List excluded datasets separated by spaces: "DM SUPPDM CO"

📊 Output examples

Dataset: AE
Number of records: 12
Dataset type: General Observation, event dataset

✓ ASCII Verification
  No non-ASCII characters found

✓ Valid define.xml Key
  Key: STUDYID, USUBJID, AEDECOD, AESTDTC

✓ Minimum Key Found
  Key: USUBJID, AETERM

---

Dataset: BE
Number of records: 3322
Dataset type: General Observation, event dataset

✓ ASCII Verification
  No non-ASCII characters found

✓ Valid define.xml Key
  Key: STUDYID, USUBJID, BEREFID, BETERM

⚠ No Valid Key Found
  Tested variables: USUBJID, BETERM, BESTDTC
  Last key tested: USUBJID, BETERM, BESTDTC
  File containing duplicated records: data_BE.csv

Duplicate records are grouped by the last key tested. A group identifier appears in the first column, with alternating row colors for visual clarity.


🏗️ Architecture

hlsv/
├── bin/
│   └── hlsv                    # Executable — starts the server
├── lib/
│   ├── hlsv.rb                 # Entry point — loads all components
│   └── hlsv/
│       ├── version.rb          # Gem version
│       ├── web_app.rb          # Sinatra web application (routes, helpers)
│       ├── mon_script.rb       # Orchestration layer
│       ├── find_keys.rb        # Analysis engine
│       ├── html2word.rb        # HTML to DOCX converter
│       └── xpt.rb              # XPT file reader
├── views/                      # ERB templates
│   ├── index.erb               # Main interface
│   ├── csv_view.erb            # CSV viewer
│   └── report_template.erb     # HTML report template
├── public/                     # Static assets
│   ├── app.js
│   ├── styles.css
│   ├── styles_csv.css
│   └── logo.png
├── hlsv.gemspec                # Gem specification
├── Gemfile                     # Development dependencies
├── config.default.yaml         # Default configuration template
├── LICENSE                     # License file
└── README.md

Results are written into a hlsv_results/ directory created at runtime in the working directory where hlsv is launched.


🔒 Security

  • Local-only access — binds to 127.0.0.1 by default (localhost only)
  • No external connections — all processing is local
  • Path traversal protection — directory traversal attacks are prevented in all file routes
  • No data collection — no analytics or tracking of any kind

🐛 Troubleshooting

"Port already in use"

# Find the process using port 4567
lsof -i :4567

# Kill it
kill -9 <PID>

# Or launch hlsv on a different port
hlsv --port 8080

"Gem not found" or load errors

# Reinstall the gem
gem uninstall hlsv
gem install hlsv

Define.xml not found

  • Use an absolute path, not a relative one
  • Verify the file exists and is readable
  • Use "-" to skip define.xml validation entirely

Debug mode

RACK_ENV=development hlsv

🤝 Contributing

Contributions are welcome!

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📝 License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).

You may use, modify, and redistribute the software under the terms of the AGPL-3.0. Any modified version deployed over a network must also be made available under the same license.

For commercial licensing inquiries or proprietary integration, please contact: 📩 contact@adclin.com


👥 Authors


🙏 Acknowledgments


📞 Support

💼 Professional Services

AdClin offers professional services related to this tool:

  • Implementation of custom validation rules
  • Integration into existing SDTM pipelines
  • Deployment in secured environments
  • Training sessions for data management teams

📩 contact@adclin.com


🗓️ Changelog

Version 1.0.0 (2026-02-23)

  • ✨ Initial release as a Ruby gem
  • 🎨 Modern responsive web interface
  • 📊 Excel export with README sheet
  • 🔍 ASCII validation
  • ✅ Define.xml key verification
  • 🔑 Natural key discovery
  • 📱 Mobile responsive design

Made with ❤️ by AdClin