Project

filtra

0.0
No commit activity in last 3 years
No release in over 3 years
Filtra filters an array of tokens or words so they can be indexed by Busca, the simple redis search
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 1.2

Runtime

 Project Readme

Filtra

Filtra filters arrays of tokens to be indexed.

Description

Filtra filters arrays of tokens to be indexed by Busca, the simple redis search. The basic filtering options include downcasing and stemming (using fast-stemmer)

Installation

As usual, you can install it using rubygems.

$ gem install filtra

Usage

The simplest usage is with default options:

filtro = Filtra.new()
words = %w(Running fishes Among the Coast line coast)
result = filtro.call(words)
puts result.inspect
#=> ["running", "fishes", "among", "the", "coast", "line"]

With default options, the case was changed and the word coast appears only once. Not really exciting, uh?.

If, for some reason you want to keep the casing, then this happens:

filtro = Filtra.new(keep_case: true)
words = %w(Running fishes Among the Coast line coast)
result = filtro.call(words)
puts result.inspect
#=> ["Running", "fishes", "Among", "the", "Coast", "line", "coast"]

Now you see the word coast appears two times. That's because the casing.

Now, let's add some stemming to the mix.

filtro = Filtra.new(stem: true)
words = %w(Running run fishes Among the Coast line coast)
result = filtro.call(words)
puts result.inspect
#=> ["run", "fish", "among", "the", "coast", "line"]

Stemming makes this a bit more interesting, thinking in indexing this later, right?

Now, let's make this really fun. Let's add a list of stopwords.

filtro = Filtra.new(stopwords: Filtra.stopwords )
words = %w(this can be a nice idea that might not work)
result = filtro.call(words)
puts result.inspect
#=> ["can", "nice", "idea", "might", "work"]

Bundled with Filtra there's a list of common stopwords, but you can just pass your own.

filtro = Filtra.new(stopwords: %w(this that those there))
words = %w(this can be a nice idea that might not work)
result = filtro.call(words)
puts result.inspect
#=> ["can", "be", "a", "nice", "idea", "might", "not", "work"]

And that's pretty much it. The code is simple, go take a look. And drop a line to julian@porta.sh if you have something to say.