asciimath2unitsml
Convert Units expressions via MathML to UnitsML
This gem converts
MathML incorporating UnitsML expressions (based on the Ascii representation provided by NIST)
into MathML complying with https://www.w3.org/TR/mathmlunits/, with
UnitsML markup embedded in it, and with unique identifiers for each distinct unit, prefix, and dimension.
Dimensions are automatically inserted corresponding to each unit.
Units expressions are identified in MathML as <mtext>unitsml(…)</mtext>
, which in turn
can be identified in AsciiMath as "unitsml(…)"
.
The consuming document is meant to deduplicate the instances of UnitsML markup with the same identifier, and potentially remove them to elsewhere in the document or another document.
Notation
The unitsml()
expression consists of a unit string.
The units used in unitsml()
are taken from the UnitsDB database as updated by Ribose:
https://github.com/unitsml/unitsdb. Units are given as an ASCII based code, consisting of
multiplication or division of single units, each of which is defined as a Prefix
(taken from https://github.com/unitsml/unitsdb/blob/master/prefixes.yaml),
unit (taken from https://github.com/unitsml/unitsdb/blob/master/units.yaml),
and exponent; e.g. mm*s^2
.
The conventions used for writing units are:

^
for exponents, e.g.m^2

*
to combine two units by multiplication; e.g.m*s^2
. 
/
to combine two units by division; 
u
for μ (micro)
For more on units notation, see Units Notation.
The unitsml()
can take additional optional parameters, giving further information for the UnitsML
to be generated:

unitsml(unitstring, quantity: ID)
provides the UnitsDB identifier for the quantity being measured (taken from https://github.com/unitsml/unitsdb/blob/master/quantities.yaml). For example,unitsml(s, quantity: NISTq109)
indicates that the second is used to measure period duration. If a single quantity is associated with the unit in UnitsDB (as given in https://github.com/unitsml/unitsdb/blob/master/units.yaml), that quantity is added automatically; otherwise, no quantity is added unless explicitly nominated in this way. 
unitsml(unitstring, name: NAME)
provides a name for the unit, if one is not already available from UnitsDB. For example,unitsml(cal_th/cm^2, name: langley)
. 
unitsml(unitstring, symbol: SYMBOL)
provides an alternate symbol for the unit, in AsciiMath. The unitstring gives the canonical representation of the unit, but SYMBOL is what will be rendered. For example,unitsml(cal_th/cm^2, name: langley, symbol: La)
, orunitsml(mm*s^2, symbol: mm cdot s^2)
. (All variables in SYMBOL are rendered upright, as is the default for units.) 
unitsml(unitstring, multiplier: SYMBOL)
provides an alternate symbol for the multiliper of units. The options are an XML entity, or the valuesspace
ornospace
(for which see discussion under Usage).
Standalone prefixes can be recognised by replacing the unit with hyphen; so unitsml(p)
corresponds
to the standalone prefix "pico" (and is rendered as "p").
The gem also supports fundamental units, e.g. unitsml(e)
for the atomic unit of charge, e,
and symbols for dimensions. The latter are entered as dim_XXX
, where XXX
is their established symbol:
Symbol  Dimension 

dim_L 
Length 
dim_M 
Mass 
dim_T 
Time 
dim_I 
Electric Current 
dim_Theta 
Thermodynamic Temperature 
dim_N 
Amount of Substance 
dim_J 
Luminous Intensity 
dim_phi 
Plane Angle (dimensionless) 
e.g. unitsml(dim_I)
for the dimension of electric current, 𝖨.
Rendering
The output of the gem is MathML, with MathML unit expressions (expressed as <mi>
,
complying with MathML Units) crossreferenced to UnitsML
definitions embedded in the MathML.
The gem follows the MathML Units convention of inserting a spacing invisible times operator
(<mo rspace='thickmathspace'></mo>
) between any numbers (<mn>
) and unit expressions
in MathML, and representing units in MathML as nonitalic variables (<mi mathvariant='normal'>
).
Space is not inserted between a number and a unit expression, when that unit expression wholly consists of punctuation: 1 m, 1 °C, but 9° 7′ 22″.
Example
9 "unitsml(C^3*A)"
is converted into:
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow>
<mn>9</mn>
<mo rspace='thickmathspace'>⁢</mo>
<mrow xref='U_C3.A'>
<msup>
<mrow>
<mi mathvariant='normal'>C</mi>
</mrow>
<mrow>
<mn>3</mn>
</mrow>
</msup>
<mo>·</mo>
<mi mathvariant='normal'>A</mi>
</mrow>
<Unit xmlns='http://unitsml.nist.gov/2005' xml:id='U_C3.A' dimensionURL='#D_T3I4'>
<UnitSystem name='SI' type='SI_derived' xml:lang='enUS'/>
<UnitName xml:lang='en'>C^3*A</UnitName>
<UnitSymbol type='HTML'>C<sup>3</sup> · A</UnitSymbol>
<UnitSymbol type='MathML'>
<math xmlns='http://www.w3.org/1998/Math/MathML'>
<mrow>
<msup>
<mrow>
<mi mathvariant='normal'>C</mi>
</mrow>
<mrow>
<mn>3</mn>
</mrow>
</msup>
<mo>·</mo>
<mi mathvariant='normal'>A</mi>
</mrow>
</math>
</UnitSymbol>
<RootUnits>
<EnumeratedRootUnit unit='coulomb' powerNumerator='3'/>
<EnumeratedRootUnit unit='ampere'/>
</RootUnits>
</Unit>
<Dimension xmlns='http://unitsml.nist.gov/2005' xml:id='D_T3I4'>
<Time symbol='T' powerNumerator='3'/>
<ElectricCurrent symbol='I' powerNumerator='4'/>
</Dimension>
</mrow>
</math>
Usage
The converter is run as:
c = Asciimath2UnitsML::Conv.new()
c.Asciimath2UnitsML('1 "unitsml(mm*s^2)"') # AsciiMath string containing UnitsML
c.MathML2UnitsML("<math xmlns='http://www.w3.org/1998/Math/MathML'><mn>7</mn>"\
"<mtext>unitsml(kg^2)</mtext></math>") # AsciiMath string containing <mtext>unitsml()</mtext>
c.MathML2UnitsML(Nokogiri::XML("<math xmlns='http://www.w3.org/1998/Math/MathML'><mn>7</mn>"\
"<mtext>unitsml(kg^2)</mtext></math>")) # Nokogiri parse of MathML document containing <mtext>unitsml()</mtext>
The converter class may be initialised with options:

multiplier
is the symbol used to represent the multiplication of units. By default, following MathML Units, the symbol is middle dot (·
). An arbitrary UTF8 string can be supplied instead; it will be encoded as XML entities. The value:space
is rendered as a spacing invisible times in MathML (<mo rspace='thickmathspace'></mo>
), and as a nonbreaking space in HTML. The value:nospace
is rendered as a nonspacing invisible times in MathML (<mo></mo>
), and is not rendered in HTML.
Units Notation
The units used in unitsml()
are taken from the UnitsDB database as updated by Ribose:
https://github.com/unitsml/unitsdb. Units are given as an ASCII based code, consisting of
multiplication or division of single units, each of which is defined as a Prefix
(taken from https://github.com/unitsml/unitsdb/blob/master/prefixes.yaml),
unit (taken from https://github.com/unitsml/unitsdb/blob/master/units.yaml),
and exponent; e.g. mm*s^2
.
In case of ambiguity, the interpretation with no prefix is prioritised over the interpretation
as a unit; so ct
is interpreted as hundredweight, rather than centiton. Exceptionally,
kg
is decomposed into kilogram rather than treated as a basic unit, for consistency with
other prefixes of grams. (Prefixed units appear in UnitsDB, and are indicated as prefixed: true
.)
A unit may have multiple symbols; these are registered separately in
units.yaml, as entries under unit_symbols
.
These different symbols will be recognised as the same Unit in the UnitsML markup, but
the original symbol will be retained in the MathML expression. So an expression like 1 unitsml(mL)
will be recognised as referring to microlitres; the expression will be given under its canonical
rendering ml
in UnitsML markup, but the MathML rendering referencing that UnitsML expression
will keep the notation mL
.
The symbols used for units can be highly ambiguous; in order to guarantee accurate parsing,
the symbols used to data enter units are unambiguous in units.yaml.
They may be found as the entries for unit_symbols/id
under each unit. For example, B
is ambiguous between
bel (as in decibel) and byte; they are kept unambiguous by using bel_B
and byte_B
to refer to them,
although they will still both be rendered as B
.
The following table is the current list of ambiguous symbols, which are disambiguated in the symbol ids used.
This table can be generated (in Asciidoc format) through Asciimath2UnitsML::Conv.new().ambig_units
:
Symbol  Unit + ID  

′ 
minute (minute of arc): 
foot: 
minute: 
minute (minute of arc): 
foot: 
minute: 
″ 
second (second of arc): 
second: 
inch: 
second (second of arc): 
second: 
inch: 
″Hg 
conventional inch of mercury: 
conventional inch of mercury: 
inch of mercury (32 degF): 
inch of mercury (60 degF): 
inch of mercury (32 degF): 
inch of mercury (60 degF): 
hp 
horsepower: 
horsepower (UK): 
horsepower, water: 
horsepower, metric: 
horsepower, boiler: 
horsepower, electric: 
Btu 
British thermal unit_IT: 
British thermal unit (mean): 
British thermal unit (39 degF): 
British thermal unit (59 degF): 
British thermal unit (60 degF): 

a 
are: 
year (365 days): 
year, tropical: 
year, sidereal: 

d 
day: 
darcy: 
day, sidereal: 

inHg 
conventional inch of mercury: 
inch of mercury (32 degF): 
inch of mercury (60 degF): 

inH_{2}O 
conventional inch of water: 
inch of water (39.2 degF): 
inch of water (60 degF): 

min 
minute: 
minim: 
minute, sidereal: 

pc 
parsec: 
pica (printer’s): 
pica (computer): 

t 
metric ton: 
long ton: 
short ton: 

B 
bel: 
byte: 

cmHg 
conventional centimeter of mercury: 
centimeter of mercury (0 degC): 

cmH_{2}O 
conventional centimeter of water: 
centimeter of water (4 degC): 

cup 
cup (US): 
cup (FDA): 

D 
debye: 
darcy: 

ft 
foot: 
foot (based on US survey foot): 

ftH_{2}O 
conventional foot of water: 
foot of water (39.2 degF): 

gi 
gill (US): 
gill [Canadian and UK (Imperial)]: 

h 
hour: 
hour, sidereal: 

′Hg 
conventional foot of mercury: 
conventional foot of mercury: 

ħ 
natural unit of action: 
atomic unit of action: 

m_{e} 
natural unit of mass: 
atomic unit of mass: 

in 
inch: 
inch (based on US survey foot): 

K 
kelvin: 
kayser: 

L 
liter: 
lambert: 

lb 
pound (avoirdupois): 
pound (troy or apothecary): 

mi 
mile: 
mile (based on US survey foot): 

mil 
mil (length): 
angular mil (NATO): 

oz 
ounce (avoirdupois): 
ounce (troy or apothecary): 

pt 
point (printer’s): 
point (computer): 

rad 
radian: 
rad (absorbed dose): 

s 
second: 
second, sidereal: 

tbsp 
tablespoon: 
tablespoon (FDA): 

ton 
ton of TNT (energy equivalent): 
ton of refrigeration (12 000 Btu_IT/h): 

tsp 
teaspoon: 
teaspoon (FDA): 

yd 
yard: 
yard (based on US survey foot): 

° 
degree (degree of arc): 

γ 
gamma: 

μ 
micron: 

Ω 
ohm: 

Å 
angstrom: 

ħ 
natural unit of action in eV s: 

abΩ 
abohm: 

(abΩ)^{1} 
abmho: 

aW 
abwatt: 

b 
barn: 

Btu_{th} 
British thermal unit_th: 

°C 
degree Celsius: 

cal_{IT} 
I.T. calorie: 

cal_{th} 
thermochemical calorie: 

°F 
degree Fahrenheit: 

a_{0} 
atomic unit of length: 

c 
natural unit of velocity: 

c_{0} 
natural unit of velocity: 

e 
atomic unit of charge: 

E_{h} 
atomic unit of energy: 

μin 
microinch: 

°K 
kelvin: 

kcal_{IT} 
kilocalorie_IT: 

kcal_{th} 
kilocalorie_th: 

mmH_{2}O 
conventional millimeter of water: 

°R 
degree Rankine: 

ƛ_{C} 
natural unit of length: 