The project is in a healthy, maintained state
A utility for downloading and parsing e621's exports
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
 Dependencies

Runtime

~> 3.3
~> 2.14
~> 2.6
 Project Readme

E621 Export Downloader

A utility for downloading and parsing e621's exports.

Installation

gem install e621_export_downloader

Or add it to your Gemfile:

bundle add e621_export_downloader

Usage

require("e621_export_downloader")

client = E621ExportDownloader::Client.new

# configure options after creation
client.config do |c|
  c.cache = true              # keep export files after reading, defaults to true
  c.rewind_on_not_found = 2  # decrease date by one day if no export is found for that date,
                              # provide an integer to limit how many days can be rewound,
                              # `true` is equivalent to `2`, defaults to false
end

# or pass an Options struct directly
client = E621ExportDownloader::Client.new(
  E621ExportDownloader::Client::Options.new(
    cache:              true,
    rewind_on_not_found: 2,
  )
)

# get the helper for interacting with a type of export
# types: pools, posts, tag_aliases, tag_implications, tags, wiki_pages
helper = client.get("posts")
# or use the named shorthand:
helper = client.posts

# get a wrapper for a specific date (time components are ignored)
# using this directly will not trigger rewinding regardless of rewind_on_not_found
export = helper.get(Date.today)

# convenience shorthand on the client — also skips rewinding
export = client.get_posts(Date.today)

# all of the following methods can also be called on the helper with a date argument;
# if rewind_on_not_found is enabled the helper decrements the date by one day until it
# finds an export or exhausts the rewind limit, at which point it raises ResolveError

# check whether the export exists for the date
exists = export.exists?
exists = helper.exists?(Date.today)

# download the export, returns the file path — not required before reading
# if you move or remove the file do not reuse the export object
file = export.download
file = helper.download(Date.today)

# delete the downloaded file, if it exists
export.delete
helper.delete(Date.today)

# get all of the records as a single array, DO NOT use this for large exports, arrays will millions of items do not perform well and will likely crash your process!
# (not to mention that the posts export is more than 5 gigabytes)
records = export.read_all

# read streams the CSV and yields each parsed record together with the total row count
# this is the recommended approach for large exports (the posts export exceeds 5 GB)
export.read do |record, total|
  # record is an E621ExportDownloader::Models::Post instance
end

Replace or extend a parser

require("e621_export_downloader")

client = E621ExportDownloader::Client.new

client.config do |c|
  c.parsers do |p|
    # replace a parser with a custom proc that receives the raw CSV row hash
    # return nil to skip a record
    p.posts = proc do |record|
      post = E621ExportDownloader::Models::Post.new(record)
      # attach extra data or wrap in your own class
      post
    end
  end
end

Models

Each export type is parsed into a corresponding model class:

Export type Model class
pools E621ExportDownloader::Models::Pool
posts E621ExportDownloader::Models::Post
tag_aliases E621ExportDownloader::Models::TagAlias
tag_implications E621ExportDownloader::Models::TagImplication
tags E621ExportDownloader::Models::Tag
wiki_pages E621ExportDownloader::Models::WikiPage

CLI

# check if an export exists for a given date; date is optional and defaults to today
# outputs "true" or "false" with no trailing newline
e621-export-downloader exists posts 2024-01-01

# download an export for a given date; date is optional and defaults to today
# outputs the path to the downloaded file with no trailing newline
e621-export-downloader download posts 2024-01-01

# read an export as individual JSON lines; date is optional and defaults to today
# outputs each record as a JSON object on its own line
e621-export-downloader read posts 2024-01-01

# read an export as a JSON array; date is optional and defaults to today
# streams the CSV internally, so it is safe to use with large exports
e621-export-downloader read-all posts 2024-01-01

e621-export-downloader --help
e621-export-downloader --version
e621-export-downloader --cache                    # enable caching (default)
e621-export-downloader --no-cache                 # disable caching
e621-export-downloader --rewind-on-not-found      # enable rewinding (up to 2 days)
e621-export-downloader --rewind-on-not-found 5    # enable rewinding (up to 5 days)
e621-export-downloader --no-rewind-on-not-found   # disable rewinding (default)

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/DonovanDMC/E621ExportDownloader.rb.

License

The gem is available as open source under the terms of the MIT License.