Project

gtfs_df

0.0
No release in over 3 years
A Ruby gem to load, filter, and manipulate GTFS (General Transit Feed Specification) feeds using DataFrames powered by Polars. Supports cascading filters that maintain referential integrity across related tables. NOTE: This gem is not ready for production use.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Runtime

~> 0.4
~> 0.22
~> 2.3
 Project Readme

ruby-gtfs-df

A ruby gem to manipulate GTFS feeds using DataFrames using Polars (ruby-polars)

This project was created to bring the power of partridge to ruby.

⚠️ Warning: This gem is not ready for production use. It is currently in active development and the API may change without notice.

Installation

Install the gem and add to the application's Gemfile by executing:

bundle add gtfs_df

If bundler is not being used to manage dependencies, install the gem by executing:

gem install gtfs_df

Usage

Loading a GTFS feed

require 'gtfs_df'

# Load from a zip file
feed = GtfsDf::Reader.load_from_zip('path/to/gtfs.zip')

# Or, load from a directory
feed = GtfsDf::Reader.load_from_dir('path/to/gtfs_dir')

# Access dataframes for each GTFS file
puts feed.agency.head
puts feed.routes.head
puts feed.trips.head
puts feed.stop_times.head
puts feed.stops.head

Filtering feeds

The library supports filtering feeds by any field in any table. The filter automatically cascades through the dependency graph to ensure referential integrity.

# Filter by agency
filtered_feed = feed.filter('agency' => { 'agency_id' => 'MTA' })

# Filter by route
filtered_feed = feed.filter('routes' => { 'route_id' => ['1', '2', '3'] })

# Filter by a service
filtered_feed = feed.filter('calendar' => { 'service_id' => 'WEEKDAY' })

# Multiple filters
filtered_feed = feed.filter(
  'agency' => { 'agency_id' => 'MTA' },
  'routes' => { 'route_type' => 1 } # Filter to subway routes
)

When you filter by a field, the library automatically:

  1. Filters the specified table
  2. Cascades related tables following foreign key relationships
  3. Keeps only the data that maintains referential integrity

For example, filtering by agency_id will automatically filter routes, trips, stop_times, and stops to only include data for that agency.

Writing filtered feeds

# Write to a new zip file
GtfsDf::Writer.write_to_zip(filtered_feed, 'output/filtered_gtfs.zip')

Example: Split feed by agency

See examples/split-by-agency for a complete example that splits a multi-agency GTFS feed into separate files per agency.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and the created tag, and push the .gem file to rubygems.org.

TODO

  • Time parsing Just like partridge, we should parse Time as seconds since midnight. There's a draft in lib/gtfs_df/utils.rb but it's not used anywhere. I haven't figured out how to properly implement with Polars.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/davidmh/ruby-gtfs_df.

License

The gem is available as open source under the terms of the MIT License.