Project

pubid-ieee

0.0
There's a lot of open issues
Library to generate, parse and manipulate IEEE PubID.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
 Dependencies

Development

~> 13.0
~> 3.0

Runtime

~> 2.0.0
~> 0.7
 Project Readme

IEEE publication identifiers ("IEEE PubID")

Purpose

Implements a mechanism to parse and utilize IEEE publication identifers.

Historic identifier patterns

There are at least two major "pattern series" of identifiers due to historical reasons: old (type I) and new (type II). This implementation attempts to support both types of publication identifier patterns.

Use cases to support

  • analyze a pattern of type I idetifier

  • parse type II idetifier into components

  • generate a filename from the components similar to type I pattern

Elements of the PubID

Publisher

Name Abbrev

Institute of Electrical and Electronics Engineers

IEEE

Report number

{number} - is a set of one or more digits and optional letters

Part

{part} - is a set of digits and optional letters; starts with a digit; if a letter or letters are present then they are in the end; optional

Subpart

{subpart} - is a set of digits and optional letters; optional, many subparts are possible

Year

{year} - is a set of 4 digits; optional

Corrigendum & Amendment

{cor} - is a corrigendum or an amendments with the pattern Cor {cornum}-{year} or Amd {cornum}:{year} where {cornum} is a set of digits; optional

Type I pattern

{publisher} {type} {series} {number}{part}.{subpart}{year} {edition}/{conform}/{correction}
  • {publisher} IEEE

  • {type} one of the values: Standard, Std, Draft, Draft Standard, Draft Supplement *

  • {series} one of the values: ISO/IEC, ISO/IEC/IEEE *

  • {number} set of digits optionally prefixed with uppercase letter and optionally suffixed with letter

  • {part} from 1 to 2 digits prefixed with . or - and optionally suffixed with up to 4 letters *

  • {subpart} 1 digit optionally suffixed with a letter *

  • {year} 4 digits prefixed with -, :, ` - `, or breakspace *

  • {edition} prefix Edition followed by a reference in brackets or prefix First edition followed by date in format YYYY-MM-DD *

  • {conform} prefix Conformance followed by 2 digits, dash, and 4 digits year *

  • {correction} prefix Cor optionally followed by breakspace, or prefix Amd followed by ., followed by from 1 to 2 digits, dash and 4 digits year *

(*) - optional

An identifier can be composed of 2 other identifiers with breakspace delimiter. Only the first identifier needs to cnatain puplisher, for the secont it’s optional

Following RegEx expression parses 100% of identifiers from the type I dataset:

{
  ^IEEE\s
  ((?<type1>Standard|Std|Draft(\sStandard|\sSupplement)?)\s)?
  ((?<series>ISO\/IEC(\/IEEE)?)\s)?
  (?<number1>[A-Z]?\d+[[:alpha:]]?)
  ([.-](?<part1>\d{1,2}(?!\d)[[:alpha:]]{0,4}))?
  (\.(?<subpart1>\d[[:alpha:]]?))?
  (?<year1>([-:]|\s-\s|,\s)\d{4})?
  (\s(IEEE\s(?<type2>Std)\s)?(?<number2>[A-Z]?\d+[[:alpha:]]?)
    ([.-](?<part2>\d{1,2}(?!\d)[[:alpha:]]{0,4}))?
    ([.](?<subpart2>\d[[:alpha:]]?))?
    (?<year2>([-:.]|_-|\s-\s|,\s)\d{4})?)?
  (\s(?<edition>Edition(\s\([^)]+\))?|First\sedition\s[\d-]+))?
  (\/(?<conform>Conformance\d{2})-(?<confyear>\d{4}))?
  (\/(?<correction>(Cor\s?|(Amd\.)\d{1,2})
    (?<coryear>(:|-|:-)\d{4}))?$
}x

Pasing PubID elements from type II identifiers

To parse PubID elements from the type II pattern identifiers we can use a RegEx expression:

{
  ^IEEE\s(?<number1>\w+(\.[A-Z]\d|\sHBK)?)
  (?<part1>(\.|\s)\d{1,4}[[:alpha:],]{0,7}|-\d?[A-Z]+|-\d(?=[-.]))?
  (?<subpart11>\.\d{1,3}[a-z]?|-\d{5}[a-z]?|-\d+(?=[-:_]))?
  (?<subpart12>\.\d|-\d+(?=-))?
  (?<year1>([-:.]|_-|\s-)\d{4})?
  (\/(?<number2>([A-Z]?\d+[a-z]?|Conformance\d+))
    ((\.|-)(?<part2>\d{1,3}[a-z]?)(?!\d))?
    (\.(?<subpart21>\d{1,2}))?)?
  (\/(?<number3>\d+)(\.(?<part3>\d))?)?
  (?<year2>([-:.]|_-|\s-)\d{4})?
  ((\/|_|-|\s\/)(?<correction>(Cor|(?i)Amd(?-i))(\s|\.|\.\s)?\d{1,2})
    (?<coryear>(:|-|:-|_[A-Z][a-z]{2}_)\d{4}(-\d{4})?)?)?$
}x

This RegEx expession covers 99% of the identifiers from the type II bibxml-ieee dataset.

File name generator

For type I identifiers file names are generated by replacing symbols /, \, ,, ', ", (, ), and breakspace with symbol . Sequences of multiple sybols should be squized to one symbol.

For type II identifiers it needs to parse PubID elements than join the elements in order:

IEEE.{number1}_{part1}.{subpart11}.{subpart12}-{year1}_{number2}_{part2}.{subpart21}_{number3}_{part3}-{year2}_{correction}-{coryear}