Project

inci_score

0.0
Low commit activity in last 3 years
No release in over a year
A library that computes the hazard of cosmetic products components, based on the Biodizionario data.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

Runtime

~> 3
 Project Readme

Table of Contents

  • Scope
  • INCI catalog
  • Computation
    • Component matching
    • Sources
  • Installation
  • Usage
    • Library
    • CLI
  • Benchmarks
    • Levenshtein in C
    • Run benchmarks

Scope

This gem computes the score of cosmetic components basing on the information provided by the Biodizionario site by Fabrizio Zago.

INCI catalog

INCI catalog is fetched directly by the bidizionario site and kept in memory.
Currently there are more than 5000 components with a hazard score that ranges from 0 (safe) to 4 (dangerous).

Computation

The computation takes care to score each component of the cosmetic basing on:

  • its hazard basing on the biodizionario score
  • its position on the list of ingredients

The total score is then calculated on a percent basis.

Component matching

Since the ingredients list could come from an unreliable source (e.g. data scanned from a captured image), the gem tries to fuzzy match the ingredients by using different algorithms:

  • exact matching
  • edit distance behind a specified tolerance
  • known hazards (ie ending in ethicone)
  • first relevant matching digits
  • matching splitted tokens

Sources

The library accepts the list of ingredients as a single string of text.
Since this source could come from an OCR program, the library performs a normalization by stripping invalid characters and removing the unimportant parts.
The ingredients are typically separated by comma, although normalizer will detect the most appropriate separator:

"Ingredients: Aqua, Disodium Laureth Sulfosuccinate, Cocamidopropiyl\nBetaine"

Installation

Install the gem from your shell:

gem install inci_score

Usage

Library

You can include this gem into your own library and start computing the INCI score:

require "inci_score"

inci = InciScore::Computer.new(src: 'aqua, dimethicone').call
inci.score # 56.25
inci.precision # 100.0

As you see the results are wrapped by an InciScore::Response object, this is useful when dealing with the CLI (read below).

Unrecognized components

The API treats unrecognized components as a common case by just marking the object as non valid.
In such case the score is computed anyway by considering only recognized components.
You can check the precision value, which is zero for unrecognized components, and changes based on the applied recognizer rule (100% when exact matching).

inci = InciScore::Computer.new(src: 'ingredients:aqua,noent1,noent2')
inci.valid? # false
inci.score # 100.0
inci.precision # 33.33
inci.unrecognized # ["noent1", "noent2"]

CLI

You can collect INCI data by using the available CLI interface:

inci_score --src="ingredients: aqua, dimethicone, pej-10, noent"

TOTAL SCORE:
      	53.22
PRECISION:
      	71.54
COMPONENTS:
      	aqua (0), dimethicone (4), peg-10 (3)
UNRECOGNIZED:
      	noent

Getting help

You can get CLI interface help by:

Usage: inci_score --src="aqua, parfum, etc"
    -s, --src=SRC                    The INCI list: "aqua, parfum, etc"
    -h, --help                       Prints this help

Benchmarks

Levenshtein in C

I noticed the APIs slows down dramatically when dealing with unrecognized components to fuzzy match on.
I profiled the code by using the benchmark-ips gem, finding the bottleneck was the pure Ruby implementation of the Levenshtein distance algorithm.

After some pointless optimization, i replaced this routine with a C implementation: i opted for the straightforward Ruby Inline library to call the C code straight from Ruby, gaining an order of magnitude in speed (x30).

Run benchmarks

Once downloaded source code, run the benchmarks by:

bundle exec rake bench