0.0
No commit activity in last 3 years
No release in over 3 years
This is a module for the Ruby language that defines a callback GEDCOM parser. It does not do any validation of a GEDCOM file, but, using application-defined callback hooks, can traverse any well-formed GEDCOM.
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
 Dependencies

Development

>= 3.3.0, ~> 3.3
>= 3.3.0, ~> 3.3
 Project Readme

Build Status Code Climate

GEDCOM-Ruby

This is a module for the Ruby language that defines a callback-based GEDCOM parser. It does not do any validation of the GEDCOM file, but, using application-defined callback hooks, can traverse any well-formed GEDCOM.

The module also includes a sophisticated date parser that can parse any standard GEDCOM-formatted date (see the GEDCOM spec for details on the format).

Installation

On the command line:

gem install gedcom_ruby

Or add this to your Gemfile

gem 'gedcom_ruby'

Usage

Given the following GEDCOM:

0 @PERSON@ INDI
1 NAME /John Smith/
1 SEX M
1 BIRT
2 PLAC New York City, New York, USA
2 DATE 1 JAN 1899
1 DEAT
2 PLAC death place
2 DATE 31 DEC 1990
1 FAMS @FAMILY@

We'd be able to access these pieces, like so:

people = {}
current_person_id = nil

parser = GEDCOM::Parser.new
parser.before ['INDI'] do |id|
  current_person_id = id
  people[id] = {}
end
parser.before ['INDI', 'NAME'] {|name| people[current_person_id][:name] = name }
parser.after ['INDI'] {|id| puts "done processing #{people[current_person_id][:name]}" }

parser.parse "my_gedcom.ged"

Alternatively, you may create a custom parser class, which inherits from GEDCOM::Parser, in which case, there is a after_initialize method, which gets called at the end of the initialization. That's where you can set up your callbacks.

Here's an example of a custom parser class:

class CustomParser < GEDCOM::Parser
  def after_initialize
    before ['INDI'] do |data|
      # data -> "@PERSON@"

      # This is called when we first see an individual
      # Typically, this is where you'll initialize an object
    end
    before ['INDI', 'NAME'] do |data|
      # data -> "/John Smith/"

      # This is called when we get to an individual's name
    end
    before ['INDI', 'BIRT', 'DATE'] do |data|
      # data -> "1 JAN 1899"

      # This is called when we first see an individual's birth date
    end

    after ['INDI'] do |data|
      # data -> "@PERSON@"

      # This is called at the end of an individual
      # Typically this is where you'll save your object
    end
  end
end

The before and after methods are used to register callbacks for specified contexts. A "context" is simply an array of strings, where each element of the array specifies a GEDCOM row type.

For example, ['INDI', 'NAME'] would be a NAME line, within the scope of an INDI record.

Callbacks can be registered with a proc, a block or a method name. They take a single parameter, which is the data for that context

Once you've set up your parser, simply call its parse method, passing it a filename, a string, or IO instance.

API Reference

module GEDCOM
  class Parser
    def initialize(&block)
      # Constructor.  Can optionally be called with a block, which is
      # used to define the before and after callbacks.

    def before(context, proc=nil, &block)
      # Registers the given proc or block to be called
      # as soon as the given context is recognized.

    def after(context, proc=nil, &block)
      # Registers the given proc or block to be called as soon as
      # the given context expires.

    def auto_concat=(boolean)
      # Sets the auto-concatenation mode (defaults to true/on).
      # When auto-concatenation is enabled, any CONT tags in the input
      # will be appended to the previous tag, and the 'after' callback
      # for that tag will include all of the concatenated data.  No
      # callbacks for CONT tags will be made in this case.  When
      # auto-concatenation is disabled, each CONT tag will be treated
      # as normal, with before and after callbacks.

    def parse(file_or_io)
      # Opens and parses the file with the given name, or an existing
      # IO instance, invoking callbacks as the registered contexts are
      # recognized.

    def context
      # Gives the current context during the parse.  Intended to be
      # used by callbacks to determine the context when the same
      # callback method is used to handle multiple contexts.


  class Date

    def initialize( date_str, calendar=DateType::DEFAULT )
    def initialize( date_str, calendar=DateType::DEFAULT ) { |err_msg| ... }
      # Creates a new GEDCOM Date object from the given string.  In the first form, if
      # the string does not define a valid date, a GEDCOM::DateFormatException is raised.
      # In the second form, an exception is not raised, but the given block is called
      # when there is an error.  Also, in the second form, a Date object is still returned,
      # but it will contain nothing except the string that was passed to it.

    def Date.safe_new( date_str )
      # Creates a new GEDCOM Date object, but never throws a DateFormatException.

    def format
      # Returns one of the following constants, indicating what the format of the date is:
          #  NONE, ABOUT, CALCULATED, ESTIMATED, BEFORE, AFTER, BETWEEN, FROM, TO, FROMTO,
          #  INTERPRETED, CHILD, CLEARED, COMPLETED, INFANT, PRE1970, QUALIFIED, STILLBORN,
          #  SUBMITTED, UNCLEARED, BIC, DNS, DNSCAN, DEAD

    def first
      # Returns a GEDCOM::DatePart object that defines the first part of the date.

    def last
      # Returns a GEDCOM::DatePart object that defines the last part of the date.  This
      # will only be valid for a date format of BETWEEN or FROMTO (indicating a range
      # of dates).

    def to_s
      # Returns the date formatted as a string.

    def is_date?
      # Returns true if the Date object defines a date, but returns false if it
      # defines some non-date value (ie, there was an error parsing the date, or if the
      # date format is one of CHILD, CLEARED, COMPLETED, INFANT, PRE1970, QUALIFIED,
      # STILLBORN, SUBMITTED, UNCLEARED, BIC, DNS, DNSCAN, or DEAD).

    def is_range?
      # Returns true if the Date object defines a date range (ie, format is either
      # BETWEEN or FROMTO).  If this is true, then Date.last will return the end of
      # the range.

    def <=>( date )
      # Compares this date with the parameter, and returns -1, 0, or 1.


  class DatePart

    def calendar
      # Returns the calendar that was used to represent the given date.  Valid values
      # are DateType::GREGORIAN, DateType::JULIAN, DateType::HEBREW, DateType::FRENCH,
      # DateType::FUTURE, DateType::UNKNOWN, and DateType::DEFAULT.

    def compliance
      # Returns the compliance of the date (ie, whether it is a valid date or not).
      # Valid values are DatePart::NONE (meaning it is a valid date), DatePart::PHRASE
      # (meaning the date contains a phrase, not a date) and DatePart::NONSTANDARD
      # (meaning there was an error parsing the date).

    def phrase
      # If the compliance is DatePart::PHRASE, this will return the phrase value.
      # Otherwise, this will raise a DateFormatException.

    def has_day?
      # Returns true if the date contains a day value.

    def has_month?
      # Returns true if the date contains a month value.

    def has_year?
      # Returns true if the date contains a year value.

    def has_year_span?
      # Returns true if the date contains a span of years (valid only for
      # DateType::GREGORIAN calendars).  This means the date was formatted
      # like '25 Jul 1974-1980'.

    def day
      # Returns the day portion of the date, if it has a day.  If it does not
      # have a day, a DateFormatException is raised.

    def month
      # Returns the month portion of the date, if it has a month.  If it does not
      # have a month, a DateFormatException is raised.  (The month value will
      # be an integer, with 1 being the first month of the year.)

    def year
      # Returns the year portion of the date, if it has a year.  If it does not
      # have a year, a DateFormatException is raised.

    def to_year
      # Returns the second year portion of the date, if it has a year span.  If
      # it does not contain a year span, a DateFormatException is raised.

    def epoch
      # Returns either "BC" or "AD", as appropriate.

    def to_s
      # Converts the DatePart object to a string.

    def <=>( date_part )
      # Compares this date_part with the parameter, and returns -1, 0, or 1.

Contributing

If you encounter an issue with this gem, please report it in the issue tracker.

Code contributions in the form of pull requests are always welcome, however all contributions must include relevant test cases.