Project

torchtext

0.01
No release in over a year
Data loaders and abstractions for text and NLP
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

>= 0.11.1
 Project Readme

TorchText Ruby

🔥 Data loaders and abstractions for text and NLP - for Ruby

Build Status

Installation

Add this line to your application’s Gemfile:

gem "torchtext"

Getting Started

This library follows the Python API. Many methods and options are missing at the moment. PRs welcome!

Examples

Text classification

Datasets

Load a dataset

train_dataset, test_dataset = TorchText::Datasets::AG_NEWS.load(root: ".data", ngrams: 2)

Supported datasets are:

Data Utils

Supports:

  • tokenizer
  • ngrams_iterator

Data Metrics

Compute the BLEU score

candidate_corpus = [["My", "full", "pytorch", "test"], ["Another", "Sentence"]]
references_corpus = [[["My", "full", "pytorch", "test"], ["Completely", "Different"]], [["No", "Match"]]]
TorchText::Data::Metrics.bleu_score(candidate_corpus, references_corpus)

NN

Supports:

  • InProjContainer
  • MultiheadAttentionContainer
  • ScaledDotProduct

Vocab

Supports:

  • Vocab

Disclaimer

This library downloads and prepares public datasets. We don’t host any datasets. Be sure to adhere to the license for each dataset.

If you’re a dataset owner and wish to update any details or remove it from this project, let us know.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/torchtext-ruby.git
cd torchtext-ruby
bundle install
bundle exec rake test