High Level SDTM Validation — hlsv
A Ruby gem providing a local web application for automated structural checks on SDTM packages, including ASCII validation, define.xml key verification, and natural key discovery.
An open-source SDTM structural validation tool for clinical data teams.
🎯 Why this tool exists
Regulatory submissions require SDTM packages to be structurally sound and internally consistent. Manual validation is often time-consuming and prone to human error.
This gem provides:
- Immediate detection of structural inconsistencies
- Automated key verification against
define.xml - Early duplicate detection
- Rapid quality control before sponsor or regulatory review
All processing is performed locally, ensuring full data confidentiality.
✨ Features
- ASCII Validation — detects non-ASCII characters in SDTM datasets
-
Define.xml Key Verification — validates keys declared in
define.xml - Natural Key Discovery — performs ad hoc searches for natural keys across datasets
- Duplicate Analysis — identifies and reports duplicate records with visual grouping
- Interactive Web Interface — user-friendly configuration and results viewing
- Multiple Export Formats — Excel (.xlsx) and CSV outputs
- Excel Export with README — comprehensive Excel reports with explanatory notes
- Local-First Processing — all processing occurs on your machine, no data leaves your computer
👤 Who is this for?
- Clinical Data Managers
- Biostatisticians
- Regulatory submission teams
- CROs preparing SDTM packages
- Sponsors performing internal QC
🔧 Prerequisites
- Ruby >= 3.0
- RubyGems (bundled with Ruby)
All gem dependencies are installed automatically.
📦 Installation
gem install hlsvThat's it. All dependencies (Sinatra, Puma, Excel libraries, etc.) are installed automatically.
🚀 Quick Start
Start the application
hlsvThe application starts on http://localhost:4567 and is accessible only from your machine.
Custom port or host
hlsv --port 8080
hlsv --host 0.0.0.0 --port 8080First-time setup
- Open your browser at http://localhost:4567
- Fill in the configuration form:
- Study Name — unique identifier for your study
- Output Directory — where duplicate files will be saved
-
Datasets Directory — path to your
.xptfiles -
Define.xml Path — path to
define.xml(or"-"to skip validation) - Key Configuration — variables to test for each dataset type
- Click "💾 Save Current Configuration"
- Click "🚀 Start Analysis"
- Wait for processing to complete
- Click the report file to view the full analysis
- Click a CSV link to inspect detected duplicates
- Refine the Key Configuration if needed and run again
- Click "📂 Load Results" to browse all generated reports
⚙️ Configuration
Configuration is managed through the web interface or by editing config.yaml directly
in the directory where you launch hlsv.
Parameter reference
| Parameter | Type | Description |
|---|---|---|
study_name |
string | Unique identifier for your study |
output_type |
string | Output format: csv (web interface) |
output_directory |
string | Directory to save duplicate files |
data_directory |
string | Path to your .xpt dataset files |
define_path |
string | Path to define.xml; use "-" to skip |
excluded_ds |
string | Space-separated datasets to exclude (e.g. "DM SUPPDM") |
event_key |
string | Key variables for event datasets (e.g. AE, BE) |
intervention_key |
string | Key variables for intervention datasets (e.g. CM, EX) |
finding_key |
string | Key variables for finding datasets (e.g. LB, VS) |
finding_about_key |
string | Key variables for finding-about datasets (e.g. FA) |
ds_key |
string | Key variables for DS dataset |
relrec_key |
string | Key variables for RELREC dataset |
CO_key … TV_key
|
string | Keys for Trial Design datasets (CO, TA, TE, TI, TS, TV) |
Example config.yaml
study_name: "MY_STUDY_001"
output_type: "csv"
output_directory: "duplicates"
data_directory: "/path/to/datasets"
define_path: "/path/to/define.xml"
excluded_ds: "DM SUPPDM"
event_key: "USUBJID AESEQ"
intervention_key: "USUBJID CMSEQ"
finding_key: "USUBJID VISITNUM SPEC SEQ"
finding_about_key: "USUBJID FAOBJ FATESTCD"
ds_key: "USUBJID EPOCH DSDECOD"
relrec_key: "STUDYID RDOMAIN USUBJID IDVAR IDVARVAL"
CO_key: "STUDYID DOMAIN USUBJID COSEQ"
TA_key: "STUDYID ARMCD EPOCH"
TE_key: "STUDYID ETCD"
TI_key: "STUDYID IETESTCD"
TS_key: "STUDYID TSPARMCD TSSEQ"
TV_key: "STUDYID VISITNUM"Configuration tips
- Separate variables with spaces in key configurations
- Use
"-"fordefine_pathto skip define.xml validation - List excluded datasets separated by spaces:
"DM SUPPDM CO"
📊 Output examples
Dataset: AE
Number of records: 12
Dataset type: General Observation, event dataset
✓ ASCII Verification
No non-ASCII characters found
✓ Valid define.xml Key
Key: STUDYID, USUBJID, AEDECOD, AESTDTC
✓ Minimum Key Found
Key: USUBJID, AETERM
---
Dataset: BE
Number of records: 3322
Dataset type: General Observation, event dataset
✓ ASCII Verification
No non-ASCII characters found
✓ Valid define.xml Key
Key: STUDYID, USUBJID, BEREFID, BETERM
⚠ No Valid Key Found
Tested variables: USUBJID, BETERM, BESTDTC
Last key tested: USUBJID, BETERM, BESTDTC
File containing duplicated records: data_BE.csv
Duplicate records are grouped by the last key tested. A group identifier appears in the first column, with alternating row colors for visual clarity.
🏗️ Architecture
hlsv/
├── bin/
│ └── hlsv # Executable — starts the server
├── lib/
│ ├── hlsv.rb # Entry point — loads all components
│ └── hlsv/
│ ├── version.rb # Gem version
│ ├── web_app.rb # Sinatra web application (routes, helpers)
│ ├── mon_script.rb # Orchestration layer
│ ├── find_keys.rb # Analysis engine
│ ├── html2word.rb # HTML to DOCX converter
│ └── xpt.rb # XPT file reader
├── views/ # ERB templates
│ ├── index.erb # Main interface
│ ├── csv_view.erb # CSV viewer
│ └── report_template.erb # HTML report template
├── public/ # Static assets
│ ├── app.js
│ ├── styles.css
│ ├── styles_csv.css
│ └── logo.png
├── hlsv.gemspec # Gem specification
├── Gemfile # Development dependencies
├── config.default.yaml # Default configuration template
├── LICENSE # License file
└── README.md
Results are written into a hlsv_results/ directory created at runtime in the working directory
where hlsv is launched.
🔒 Security
-
Local-only access — binds to
127.0.0.1by default (localhost only) - No external connections — all processing is local
- Path traversal protection — directory traversal attacks are prevented in all file routes
- No data collection — no analytics or tracking of any kind
🐛 Troubleshooting
"Port already in use"
# Find the process using port 4567
lsof -i :4567
# Kill it
kill -9 <PID>
# Or launch hlsv on a different port
hlsv --port 8080"Gem not found" or load errors
# Reinstall the gem
gem uninstall hlsv
gem install hlsvDefine.xml not found
- Use an absolute path, not a relative one
- Verify the file exists and is readable
- Use
"-"to skip define.xml validation entirely
Debug mode
RACK_ENV=development hlsv🤝 Contributing
Contributions are welcome!
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📝 License
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
You may use, modify, and redistribute the software under the terms of the AGPL-3.0. Any modified version deployed over a network must also be made available under the same license.
For commercial licensing inquiries or proprietary integration, please contact: 📩 contact@adclin.com
👥 Authors
- AdClin Team — https://adclin.com
- Marie Ober
🙏 Acknowledgments
- Built with Sinatra
- Excel generation with fast_excel
- Served by Puma
📞 Support
- 📧 Email: adclin@gmail.com
- 🐛 Issues: GitHub Issues
💼 Professional Services
AdClin offers professional services related to this tool:
- Implementation of custom validation rules
- Integration into existing SDTM pipelines
- Deployment in secured environments
- Training sessions for data management teams
🗓️ Changelog
Version 1.0.0 (2026-02-23)
- ✨ Initial release as a Ruby gem
- 🎨 Modern responsive web interface
- 📊 Excel export with README sheet
- 🔍 ASCII validation
- ✅ Define.xml key verification
- 🔑 Natural key discovery
- 📱 Mobile responsive design
Made with ❤️ by AdClin