Project

excavate

0.0
The project is in a healthy, maintained state
Extract nested archives with a single command.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

~> 0.0
~> 0.1
~> 1.0
~> 2.3
~> 1.0
~> 2.3, >= 2.3.24
~> 1.4
 Project Readme

Excavate: Extraction of nested archives

Gem Version Code Climate Build Status

Purpose

Excavate is a Ruby gem that provides a unified interface for extracting nested archives across multiple compression and archive formats. The gem enables recursive extraction of archives within archives, making it ideal for processing complex software distributions, font packages, and other nested archive scenarios.

Features

  • Basic archive extraction

  • Recursive extraction of nested archives

  • Selective file extraction

  • Filter-based extraction

  • Command-line interface

  • Supported archive formats

Architecture

Excavate follows a clean object-oriented architecture with clear separation of concerns:

Excavate architecture overview
┌─────────────────────────────────────────────────────────────┐
│                                                             │
│                     Excavate::Archive                       │
│           (Facade providing unified interface)              │
│                                                             │
└───────────────────┬─────────────────────────────────────────┘
                    │
                    │ delegates to
                    │
    ┌───────────────┴───────────────┐
    │                               │
    │    Format Type Registry       │
    │    (TYPES hash mapping)       │
    │                               │
    └───────────────┬───────────────┘
                    │
                    │ instantiates
                    │
    ┌───────────────┴───────────────────────────────────┐
    │                                                   │
    │           Extractors::Extractor                   │
    │           (Abstract base class)                   │
    │                                                   │
    └───────────────┬───────────────────────────────────┘
                    │
                    │ specialized by
                    │
    ┌───────────────┴───────────────────────────┐
    │                                           │
    ├─ CabExtractor    ├─ RpmExtractor         │
    ├─ CpioExtractor   ├─ SevenZipExtractor    │
    ├─ GzipExtractor   ├─ TarExtractor         │
    ├─ OleExtractor    ├─ XarExtractor         │
    ├─ XzExtractor     ├─ ZipExtractor         │
    │                                           │
    └───────────────────────────────────────────┘
Data flow for archive extraction
User Request
    │
    ├── Archive.new(file_path)
    │       │
    │       ├── Detect format from extension
    │       │
    │       └── Select appropriate Extractor
    │
    ├── Archive#extract(target_dir)
    │       │
    │       ├── Create target directory
    │       │
    │       ├── Extractor#extract(target)
    │       │       │
    │       │       └── Format-specific extraction
    │       │
    │       └── [Optional] Recursive extraction
    │               │
    │               └── Repeat for nested archives
    │
    └── Return: Extracted files

The architecture follows these principles:

  • Single Responsibility: Each extractor handles one format

  • Open/Closed: New formats can be added without modifying existing code

  • Dependency Inversion: Archive class depends on Extractor abstraction

Installation

Add this line to your application’s Gemfile:

gem "excavate"

And then execute:

bundle install

Or install it yourself as:

gem install excavate

Supported formats

Excavate supports the following archive and compression formats:

  • CAB (.cab, .exe with CAB)

  • CPIO (.cpio)

  • GZIP (.gz)

  • MSI (.msi)

  • RPM (.rpm)

  • 7-Zip (.7z, .exe with 7z)

  • TAR (.tar)

  • XAR (.pkg)

  • XZ (.xz, .tar.xz)

  • ZIP (.zip)

All formats support recursive extraction for nested archives.

Basic extraction

General

This feature provides the fundamental capability to extract archives to a target directory. It is the core functionality of Excavate and is used as the foundation for all other extraction features.

Syntax

Excavate::Archive.new(archive_path).extract(target_directory) (1) (2)
  1. archive_path - Path to the archive file to extract

  2. target_directory - Directory where files will be extracted (optional)

Where,

archive_path

(required) Path to the archive file to extract. Can be an absolute or relative path.

target_directory

(optional) Target directory for extraction. If omitted, a directory with the archive’s base name will be created in the current directory.

Usage example

Example 1. Extracting a ZIP archive to a specific directory
require "excavate"
require "tmpdir"

# Create a temporary directory for extraction
target = Dir.mktmpdir

# Extract the archive
Excavate::Archive.new("fonts.zip").extract(target)

# List extracted files
Dir.glob(File.join(target, "**", "*")).each do |file|
  puts file if File.file?(file)
end

This example extracts a ZIP archive to a temporary directory and lists all extracted files.

Example 2. Extracting with automatic target directory creation
require "excavate"

# Extract will create a directory named "fonts" in current directory
Excavate::Archive.new("fonts.zip").extract

# Files are now in ./fonts/

When no target is specified, Excavate creates a directory with the archive’s base name (without extension) in the current working directory.

Recursive extraction

General

This feature enables automatic extraction of nested archives. When an archive contains other archives, Excavate can recursively extract them all in a single operation. This is particularly useful for complex software distributions that package multiple archives together.

Syntax

Excavate::Archive.new(archive_path).extract(
  target_directory,
  recursive_packages: true (1)
)
  1. Enable recursive extraction of nested archives

Where,

recursive_packages

(optional) Boolean flag to enable recursive extraction. When true, archives found within the extracted files are automatically extracted. Default is false.

Usage example

Example 3. Recursively extracting nested archives
require "excavate"
require "tmpdir"

target = Dir.mktmpdir

# Extract an MSI file that contains CAB archives
Excavate::Archive.new("fonts.msi").extract(
  target,
  recursive_packages: true
)

# All nested CAB files are automatically extracted
# Final structure contains only the font files

This example shows extraction of an MSI installer that contains nested CAB archives. With recursive_packages: true, both the MSI and all contained CAB files are automatically extracted.

Example 4. Processing files during recursive extraction
require "excavate"

fonts = []

Excavate::Archive.new("fonts.tar.gz").files(
  recursive_packages: true
) do |file|
  fonts << file if file.end_with?(".ttf", ".otf")
end

puts "Found #{fonts.size} font files"

The files method with recursive_packages: true processes each extracted file through a block, allowing selective collection of specific file types.

Selective extraction

General

This feature allows extraction of specific files from an archive without extracting the entire contents. It is useful when working with large archives where only certain files are needed.

Syntax

Excavate::Archive.new(archive_path).extract(
  target_directory,
  files: [file1, file2, ...] (1)
)
  1. Array of specific file paths to extract from the archive

Where,

files

(optional) Array of file paths to extract. Paths should match the structure within the archive. If a file is not found, a TargetNotFoundError is raised.

Usage example

Example 5. Extracting specific files from an archive
require "excavate"

target = "/tmp/extracted"

# Extract only specific font files
files = Excavate::Archive.new("fonts.zip").extract(
  target,
  files: ["Fonts/Arial.ttf", "Fonts/Verdana.ttf"]
)

puts "Extracted #{files.size} files:"
files.each { |f| puts "  - #{File.basename(f)}" }

This extracts only the specified files, even though the archive may contain many more files.

Example 6. Extracting files from nested archives
require "excavate"

# Extract a specific file from a nested archive
# Path format: nested.zip/inner_file
files = Excavate::Archive.new("outer.zip").extract(
  "/tmp/out",
  files: ["nested.zip/important.txt"],
  recursive_packages: true
)

# The file from the nested archive is extracted

When combined with recursive_packages: true, you can specify paths through nested archives using the format archive.zip/path/to/file.

Filter-based extraction

General

This feature provides pattern-based file selection for extraction. Instead of specifying exact file paths, you can use glob patterns to match multiple files, making it ideal for extracting files by type or naming convention.

Syntax

Excavate::Archive.new(archive_path).extract(
  target_directory,
  filter: "pattern" (1)
)
  1. Glob pattern to match files for extraction

Where,

filter

(optional) Glob pattern string to match files. Supports standard glob syntax including (any characters), * (any directories), and character classes. If no files match, a TargetNotFoundError is raised.

Usage example

Example 7. Extracting all files of a specific type
require "excavate"

# Extract only TrueType fonts from any directory
files = Excavate::Archive.new("fonts.zip").extract(
  "/tmp/fonts",
  filter: "**/*.ttf"
)

puts "Extracted #{files.size} TrueType font files"

The pattern */.ttf matches all .ttf files in any subdirectory within the archive.

Example 8. Extracting files with complex patterns
require "excavate"

# Extract configuration files from specific directories
files = Excavate::Archive.new("config.tar.gz").extract(
  "/tmp/conf",
  filter: "etc/**/*.{conf,cfg}"
)

# Extracts .conf and .cfg files only from the 'etc' directory tree

Complex patterns can use brace expansion to match multiple extensions or patterns.

Command-line interface

General

Excavate provides a command-line tool for extracting archives directly from the shell. The CLI supports all the same features as the Ruby API, making it suitable for shell scripts and interactive use.

Syntax

excavate [OPTIONS] ARCHIVE [FILES...] (1) (2) (3)
  1. Command options for controlling extraction behavior

  2. Path to the archive file

  3. Optional list of specific files to extract

Where,

--recursive

Enable recursive extraction of nested archives

--filter PATTERN

Extract only files matching the glob pattern

ARCHIVE

Path to the archive file to extract

FILES…​

Optional list of specific file paths to extract

Usage example

Example 9. Basic command-line extraction
# Extract archive to a directory with the archive's base name
excavate fonts.zip

# Extract with recursive nested archive processing
excavate --recursive application.msi

# Extract from a directory of archives
excavate --recursive archive_directory/

Basic extraction creates a directory named after the archive (without extension) in the current directory.

Example 10. Selective extraction via CLI
# Extract specific files
excavate fonts.zip Fonts/Arial.ttf Fonts/Verdana.ttf

# Extract files matching a pattern
excavate --filter "**/*.ttf" fonts.zip

# Extract from nested archives
excavate --recursive outer.zip nested.zip/file.txt

The CLI supports the same selective extraction features as the Ruby API.

Example 11. Processing XZ compressed archives
# Extract TAR.XZ archive
excavate wine-10.18.tar.xz

# Extract XZ with recursive processing
excavate --recursive package.tar.xz

# Extract specific files from XZ archive
excavate package.tar.xz --filter "*.conf"

XZ compressed archives (both .xz and .tar.xz) are fully supported through the command-line interface.

Dependencies

Excavate depends on the following system libraries through the ffi-libarchive-binary gem:

  • zlib

  • Expat

  • OpenSSL (for Linux only)

These dependencies are generally present on all systems and require no special installation steps.

Development

General

When contributing to Excavate, follow these development guidelines to maintain code quality and consistency.

Coding standards

We follow Sandi Metz’s Rules for this gem. You can read the description of the rules here. All new code should follow these rules. If you make changes in a pre-existing file that violates these rules, you should fix the violations as part of your contribution.

Testing

Run the test suite with:

bundle exec rspec

Ensure all tests pass before submitting a pull request.

Releasing

Releasing is done automatically with GitHub Actions. Just bump and tag with gem-release.

For a patch release (0.0.x) use:

gem bump --version patch --tag --push

For a minor release (0.x.0) use:

gem bump --version minor --tag --push

Contributing

First, thank you for contributing! We love pull requests from everyone. By participating in this project, you hereby grant Ribose Inc. the right to grant or transfer an unlimited number of non exclusive licenses or sub-licenses to third parties, under the copyright covering the contribution to use the contribution by all means.

Here are a few technical guidelines to follow:

  1. Open an issue to discuss a new feature.

  2. Write tests to support your new feature.

  3. Make sure the entire test suite passes locally and on CI.

  4. Open a Pull Request.

  5. Squash your commits after receiving feedback.

  6. Party!

License

This gem is distributed with a BSD 3-Clause license.

This gem is developed, maintained and funded by Ribose Inc.