Project

twins

0.0
No commit activity in last 3 years
No release in over 3 years
Twin sorts through the small differences between multiple objects and smartly consolidate all of them together.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Runtime

 Project Readme

Twins

Twins sorts through the small differences between multiple objects and smartly consolidate all of them together.

Gem Version Code Climate Dependency Status Build Status twins API Documentation

Usage

Let's say you have a collection of objects representing the same book but from different sources, which brings the possibility for each object to be slightly different from one another.

books = [{
  title: "Shantaram: A Novel",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
},
{
  title: "Shantaram",
  author: "Gregory David Roberts & Alejandro Palomas",
  published: 2012,
  details: {
    paperback: false
  }
},
{
  title: "Shantaram",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
},
{
  title: "Shantaram",
  author: "Gregory D. Roberts",
  published: 2005,
  details: {
    paperback: true
  }
}]

Consolidate

Assembles a new Hash based on every elements in the collection. By default Twins#consolidate will determine the candidate values based on the most frequent value present for a given key, also known as the mode.

Twins.consolidate(books)
{
  title: "Shantaram",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
}

You may also provide Twins#consolidate with priorities for String and Numeric attributes, which will precede on the mode while determining the canditate value.

options = {
  priority: {
    title: "Novel"
  }
}

Twins.consolidate(books, options)
{
  title: "Shantaram: A Novel",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
}

Pick

Selects the collection's most representative element. By default Twins.pick will determine the candidate element based on the highest count of modes present for a given element.

Twins.pick(books)
{
  title: "Shantaram",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
}

You may also provide Twins#pick with priorities for String and Numeric attributes, which will be used to compute each element's overall distance while determining the canditate element.

options = {
  priority: {
    title: "Novel"
  }
}

Twins.pick(books, options)
{
  title: "Shantaram: A Novel",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
}

Internals

Distance

String distances are calculated using a longest subsequence algorithm and Numeric distances are calculated with their difference.

Contributing

  1. Fork it
  2. Create a topic branch
  3. Add specs for your unimplemented modifications
  4. Run bundle exec rspec. If specs pass, return to step 3.
  5. Implement your modifications
  6. Run bundle exec rspec. If specs fail, return to step 5.
  7. Commit your changes and push
  8. Submit a pull request
  9. Thank you!

TODO

  • Think about using jaccard to weight items

Author

Philippe Dionne

License

See LICENSE