Excavate: Extraction of nested archives
Purpose
Excavate is a Ruby gem that provides a unified interface for extracting nested archives across multiple compression and archive formats. The gem enables recursive extraction of archives within archives, making it ideal for processing complex software distributions, font packages, and other nested archive scenarios.
Features
-
Basic archive extraction
-
Recursive extraction of nested archives
-
Selective file extraction
-
Filter-based extraction
-
Command-line interface
-
Supported archive formats
Architecture
Excavate follows a clean object-oriented architecture with clear separation of concerns:
┌─────────────────────────────────────────────────────────────┐
│ │
│ Excavate::Archive │
│ (Facade providing unified interface) │
│ │
└───────────────────┬─────────────────────────────────────────┘
│
│ delegates to
│
┌───────────────┴───────────────┐
│ │
│ Format Type Registry │
│ (TYPES hash mapping) │
│ │
└───────────────┬───────────────┘
│
│ instantiates
│
┌───────────────┴───────────────────────────────────┐
│ │
│ Extractors::Extractor │
│ (Abstract base class) │
│ │
└───────────────┬───────────────────────────────────┘
│
│ specialized by
│
┌───────────────┴───────────────────────────┐
│ │
├─ CabExtractor ├─ RpmExtractor │
├─ CpioExtractor ├─ SevenZipExtractor │
├─ GzipExtractor ├─ TarExtractor │
├─ OleExtractor ├─ XarExtractor │
├─ XzExtractor ├─ ZipExtractor │
│ │
└───────────────────────────────────────────┘
User Request
│
├── Archive.new(file_path)
│ │
│ ├── Detect format from extension
│ │
│ └── Select appropriate Extractor
│
├── Archive#extract(target_dir)
│ │
│ ├── Create target directory
│ │
│ ├── Extractor#extract(target)
│ │ │
│ │ └── Format-specific extraction
│ │
│ └── [Optional] Recursive extraction
│ │
│ └── Repeat for nested archives
│
└── Return: Extracted files
The architecture follows these principles:
-
Single Responsibility: Each extractor handles one format
-
Open/Closed: New formats can be added without modifying existing code
-
Dependency Inversion: Archive class depends on Extractor abstraction
Installation
Add this line to your application’s Gemfile:
gem "excavate"And then execute:
bundle installOr install it yourself as:
gem install excavateSupported formats
Excavate supports the following archive and compression formats:
-
CAB (
.cab,.exewith CAB) -
CPIO (
.cpio) -
GZIP (
.gz) -
MSI (
.msi) -
RPM (
.rpm) -
7-Zip (
.7z,.exewith 7z) -
TAR (
.tar) -
XAR (
.pkg) -
XZ (
.xz,.tar.xz) -
ZIP (
.zip)
All formats support recursive extraction for nested archives.
Basic extraction
General
This feature provides the fundamental capability to extract archives to a target directory. It is the core functionality of Excavate and is used as the foundation for all other extraction features.
Syntax
Excavate::Archive.new(archive_path).extract(target_directory) (1) (2)-
archive_path- Path to the archive file to extract -
target_directory- Directory where files will be extracted (optional)
Where,
archive_path-
(required) Path to the archive file to extract. Can be an absolute or relative path.
target_directory-
(optional) Target directory for extraction. If omitted, a directory with the archive’s base name will be created in the current directory.
Usage example
require "excavate"
require "tmpdir"
# Create a temporary directory for extraction
target = Dir.mktmpdir
# Extract the archive
Excavate::Archive.new("fonts.zip").extract(target)
# List extracted files
Dir.glob(File.join(target, "**", "*")).each do |file|
puts file if File.file?(file)
endThis example extracts a ZIP archive to a temporary directory and lists all extracted files.
require "excavate"
# Extract will create a directory named "fonts" in current directory
Excavate::Archive.new("fonts.zip").extract
# Files are now in ./fonts/When no target is specified, Excavate creates a directory with the archive’s base name (without extension) in the current working directory.
Recursive extraction
General
This feature enables automatic extraction of nested archives. When an archive contains other archives, Excavate can recursively extract them all in a single operation. This is particularly useful for complex software distributions that package multiple archives together.
Syntax
Excavate::Archive.new(archive_path).extract(
target_directory,
recursive_packages: true (1)
)-
Enable recursive extraction of nested archives
Where,
recursive_packages-
(optional) Boolean flag to enable recursive extraction. When
true, archives found within the extracted files are automatically extracted. Default isfalse.
Usage example
require "excavate"
require "tmpdir"
target = Dir.mktmpdir
# Extract an MSI file that contains CAB archives
Excavate::Archive.new("fonts.msi").extract(
target,
recursive_packages: true
)
# All nested CAB files are automatically extracted
# Final structure contains only the font filesThis example shows extraction of an MSI installer that contains nested CAB
archives. With recursive_packages: true, both the MSI and all contained CAB
files are automatically extracted.
require "excavate"
fonts = []
Excavate::Archive.new("fonts.tar.gz").files(
recursive_packages: true
) do |file|
fonts << file if file.end_with?(".ttf", ".otf")
end
puts "Found #{fonts.size} font files"The files method with recursive_packages: true processes each extracted
file through a block, allowing selective collection of specific file types.
Selective extraction
General
This feature allows extraction of specific files from an archive without extracting the entire contents. It is useful when working with large archives where only certain files are needed.
Syntax
Excavate::Archive.new(archive_path).extract(
target_directory,
files: [file1, file2, ...] (1)
)-
Array of specific file paths to extract from the archive
Where,
files-
(optional) Array of file paths to extract. Paths should match the structure within the archive. If a file is not found, a
TargetNotFoundErroris raised.
Usage example
require "excavate"
target = "/tmp/extracted"
# Extract only specific font files
files = Excavate::Archive.new("fonts.zip").extract(
target,
files: ["Fonts/Arial.ttf", "Fonts/Verdana.ttf"]
)
puts "Extracted #{files.size} files:"
files.each { |f| puts " - #{File.basename(f)}" }This extracts only the specified files, even though the archive may contain many more files.
require "excavate"
# Extract a specific file from a nested archive
# Path format: nested.zip/inner_file
files = Excavate::Archive.new("outer.zip").extract(
"/tmp/out",
files: ["nested.zip/important.txt"],
recursive_packages: true
)
# The file from the nested archive is extractedWhen combined with recursive_packages: true, you can specify paths through
nested archives using the format archive.zip/path/to/file.
Filter-based extraction
General
This feature provides pattern-based file selection for extraction. Instead of specifying exact file paths, you can use glob patterns to match multiple files, making it ideal for extracting files by type or naming convention.
Syntax
Excavate::Archive.new(archive_path).extract(
target_directory,
filter: "pattern" (1)
)-
Glob pattern to match files for extraction
Where,
filter-
(optional) Glob pattern string to match files. Supports standard glob syntax including
(any characters),*(any directories), and character classes. If no files match, aTargetNotFoundErroris raised.
Usage example
require "excavate"
# Extract only TrueType fonts from any directory
files = Excavate::Archive.new("fonts.zip").extract(
"/tmp/fonts",
filter: "**/*.ttf"
)
puts "Extracted #{files.size} TrueType font files"The pattern */.ttf matches all .ttf files in any subdirectory within the
archive.
require "excavate"
# Extract configuration files from specific directories
files = Excavate::Archive.new("config.tar.gz").extract(
"/tmp/conf",
filter: "etc/**/*.{conf,cfg}"
)
# Extracts .conf and .cfg files only from the 'etc' directory treeComplex patterns can use brace expansion to match multiple extensions or patterns.
Command-line interface
General
Excavate provides a command-line tool for extracting archives directly from the shell. The CLI supports all the same features as the Ruby API, making it suitable for shell scripts and interactive use.
Syntax
excavate [OPTIONS] ARCHIVE [FILES...] (1) (2) (3)-
Command options for controlling extraction behavior
-
Path to the archive file
-
Optional list of specific files to extract
Where,
--recursive-
Enable recursive extraction of nested archives
--filter PATTERN-
Extract only files matching the glob pattern
ARCHIVE-
Path to the archive file to extract
FILES…-
Optional list of specific file paths to extract
Usage example
# Extract archive to a directory with the archive's base name
excavate fonts.zip
# Extract with recursive nested archive processing
excavate --recursive application.msi
# Extract from a directory of archives
excavate --recursive archive_directory/Basic extraction creates a directory named after the archive (without extension) in the current directory.
# Extract specific files
excavate fonts.zip Fonts/Arial.ttf Fonts/Verdana.ttf
# Extract files matching a pattern
excavate --filter "**/*.ttf" fonts.zip
# Extract from nested archives
excavate --recursive outer.zip nested.zip/file.txtThe CLI supports the same selective extraction features as the Ruby API.
# Extract TAR.XZ archive
excavate wine-10.18.tar.xz
# Extract XZ with recursive processing
excavate --recursive package.tar.xz
# Extract specific files from XZ archive
excavate package.tar.xz --filter "*.conf"XZ compressed archives (both .xz and .tar.xz) are fully supported through
the command-line interface.
Dependencies
Excavate depends on the following system libraries through the ffi-libarchive-binary gem:
-
zlib
-
Expat
-
OpenSSL (for Linux only)
These dependencies are generally present on all systems and require no special installation steps.
Development
General
When contributing to Excavate, follow these development guidelines to maintain code quality and consistency.
Coding standards
We follow Sandi Metz’s Rules for this gem. You can read the description of the rules here. All new code should follow these rules. If you make changes in a pre-existing file that violates these rules, you should fix the violations as part of your contribution.
Testing
Run the test suite with:
bundle exec rspecEnsure all tests pass before submitting a pull request.
Releasing
Releasing is done automatically with GitHub Actions. Just bump and tag with
gem-release.
For a patch release (0.0.x) use:
gem bump --version patch --tag --pushFor a minor release (0.x.0) use:
gem bump --version minor --tag --pushContributing
First, thank you for contributing! We love pull requests from everyone. By participating in this project, you hereby grant Ribose Inc. the right to grant or transfer an unlimited number of non exclusive licenses or sub-licenses to third parties, under the copyright covering the contribution to use the contribution by all means.
Here are a few technical guidelines to follow:
-
Open an issue to discuss a new feature.
-
Write tests to support your new feature.
-
Make sure the entire test suite passes locally and on CI.
-
Open a Pull Request.
-
Squash your commits after receiving feedback.
-
Party!
License
This gem is distributed with a BSD 3-Clause license.
This gem is developed, maintained and funded by Ribose Inc.