Skip to content

Python library to parse and work with GEDCOM (geneology/family tree) files

License

Notifications You must be signed in to change notification settings

gregersn/gedcompy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gedcompy

Python library to parse and work with GEDCOM (genealogy/family tree) files.

It's goal is to support GEDCOM v5.5 (Specification Here).

This is released under the GNU General Public Licence version 3 (or at your option, a later version). See the file LICENCE for more.

Requirements

  • python2.7
  • pip (package installer python, included in python download)
  • datetime (pip install datetime)
  • six (pip install six)

Installation

from the terminal, run python setup.py build && python setup.py install

gedcompy Usage:

gedcompy parses out the records of each person and stores them as nested objects, accesible through dot notation.

Example Usage

    >>> import gedcom
    >>> gedcomfile = gedcom.parse("myfamilytree.ged")
    >>> for person in gedcomfile.individuals:
    ...    firstname, lastname = person.name
    ...    print "{0} {1} is in the file".format(firstname, lastname)

The file as a whole is a generator object.

>>> import gedcom
>>> gedfile = gedcom.parse("myfamilytree.ged")
>>> print gedfile
# GedcomFile(Element(0, 'HEAD', [Element(1, 'CHAR', 'UTF-8'), Element(1, 'SOUR', 'Ancestry.com Family Trees', [Element(2, 'VERS', '(2010.3)'), Element(2, 'NAME', 'Ancestry.com Family Trees'), Element(2, 'CORP', 'Ancestry.com')]), Element(1, 'GEDC', [Element(2, 'VERS', '5.5'), Element(2, 'FORM', 'LINEAGE-LINKED')])]),
# Individual(0, 'INDI', '@P1@', [Birth(1, 'BIRT', [Element(2, 'DATE', '03 dec 1970')]), Element(1, 'SEX', 'M'), Element(1, 'NAME', 'John /Smith/'), Element(1, 'FAMC', '@F1@')])
# Individual(0, 'INDI', '@P2@', [Element(1, 'NAME', 'Jane /Doe/'), Element(1, 'SEX', 'M'), Birth(1, 'BIRT', [Element(2, 'DATE', '06 nov 1946'), Element(2, 'PLAC', 'Brooklyn, New York City, New York, USA')]), Element(1, 'FAMS', '@F1@')])

Individuals

Cannot access individuals as a whole:

>>> print gedfile.individuals
# <generator object <genexpr> at 0x103ef25a0>
>>> print gedfile.individual # Probably Don't.
# Don't do this. It prints all individual records in the file.

To access individual records:

>>> for person in gedfile.individuals:
...     print person
# Individual(0, 'INDI', '@P1@', [Birth(1, 'BIRT', [Element(2, 'DATE', '03 dec 1970')]), Element(1, 'SEX', 'M'), Element(1, 'NAME', 'John /Smith/'), Element(1, 'FAMC', '@F1@')])
# Individual(0, 'INDI', '@P2@', [Element(1, 'NAME', 'Jane /Doe/'), Element(1, 'SEX', 'M'), Birth(1, 'BIRT', [Element(2, 'DATE', '06 nov 1946'), Element(2, 'PLAC', 'Brooklyn, New York City, New York, USA')]), Element(1, 'FAMS', '@F1@')])

To access individual records of a specific type use dot notation:

>>> for person in gedfile.individuals:
...     print person.birth
# Birth(1, 'BIRT', [Element(2, 'DATE', '03 dec 1970')])
# Birth(1, 'BIRT', [Element(2, 'DATE', '06 nov 1946'), Element(2, 'PLAC', 'Brooklyn, New York City, New York, USA')])

To specify individual record types:

>>> for person in gedfile.individuals:
...     print person.birth.date
# 03 dec 1970
# 06 nov 1946
>>> for person in gedfile.individuals:
...     print person.birth.place
# AttributeError: 'NoneType' object has no attribute 'value'
# this does not print: Brooklyn, New York City, New York, USA

The AttributeError is thrown when a record of that type does not exist, and by default will NOT pass onto the next record.

current available use cases
person.birth              # class - birth
person.birth.place        # string
person.birth.date         # string
person.death              # class - death
person.death.place        # string
person.death.date         # string
person.name               # tuple - firstname, lastname
person.father             # class - father
person.mother             # class - mother
person.parents            # list - contaning father and mother class
person.aka                # list - 'also known as' name
person.gender             # string - 'M' or 'F'
person.sex                # string - ''     ''
person.id                 # string - @P12@
person.is_female          # boolean
person.is_male            # boolean
person.note               # string
person.title              # string
person.default_tag        # string tagname : 'INDI', 'FAM', etc
person.tag                # string tagname : 'INDI', 'FAM', etc

Advanced usage

Get the name of a person and parents of that person:

>>> for person in gedfile.individuals:
...     try:
...         print person.name, person.parents[0].name, person.parents[1].name
...     except IndexError:
...         print "no parent name record for this person"
# OR
>>> for person in gedfile.individual:
...     try:
...         print person.name, person.father.name, person.mother.name
...     except AttributeError:
...        print "no parent name record for this person"
# either one will print:
# ('John', 'Doe') ('Jack', 'Doe') ('Jane', 'Doe')
# ('Jenny', 'Doe') ('Jack', 'Doe') ('Jane', 'Doe')

Families

Family records are accessed the same way as individuals

>>> print gedfile.families
# <generator object <genexpr> at 0x10523c7d0>
>>> print gedfile.family # Probably don't.
# Don't do this. Prints all family records in the family
>>> for family in gedfile.families:
...     print family
# Family(0, 'FAM', '@F1@', [Husband(1, 'HUSB', '@P5@'), Wife(1, 'WIFE', '@P1@'), Element(1, 'CHIL', '@P2@', [Element(2, '_FREL', 'Natural'), Element(2, '_MREL', 'Natural')])])

>>> for family in gedfile.families:
...     print family.partners
# [Husband(1, 'HUSB', '@P5@'), Wife(1, 'WIFE', '@P1@')]

Use cases for partners:

>>> for family in gedfile.families:
...     print family.partners[0]
...     print family.partners[1]
# Husband(1, 'HUSB', '@P5@')
# Wife(1, 'WIFE', '@P1@')

>>> for family in gedfile.families:
...     print family.partners[0].tag
# HUSB

>>> for family in gedfile.families:
...     print family.partners[0].value
# @P5@

>>> for family in gedfile.families:
...     print family.husband
...     print family.wife
# Husband(1, 'HUSB', '@P5@')
# Wife(1, 'WIFE', '@P1@')
current available use cases
family.id                       # string '@F49@'
family.tag                      # string 'FAM'
family.partners                 # list 
family.wife                     # class - wife
family.husband                  # class - husband
family.children                 # list
family.children.father_relation # String 'Natural'
family.children.mother_relation # string 'Natural'

Residence

Residence records

>>> for person in gedfile.individuals:
...			print person.residence
# Residence(1, 'RESI', 'Marital Status: SingleRelation to Head of House: Son', [Element(2, 'DATE', '1910'), Element(2, 'PLAC', 'Lowell Ward 6, Middlesex, Massachusetts, USA'), Source(2, 'SOUR', '@S1002094821@', [Element(3, 'PAGE', 'Year: 1910; Census Place: Lowell Ward 6, Middlesex, Massachusetts; Roll: T624_600; Page: 33A; Enumeration District: 0864; FHL microfilm: 1374613'), Element(3, '_APID', '1,7884::108099427')])])
>>> for person in gedfile.individuals:
...			print person.residence.date
...			print person.residence.place
# 1910
# Wilmington Ward 3, New Hanover, North Carolina, USA
current available use cases
residence.date
residence.id
residence.note
residence.parent_id
residence.place
residence.source
residence.value

###Sources Source records This gets into more deeply nested elements. As noted previously, sources can also be nested within an individuals element as well as recorded for the individual themself.

>>> for person in gedfile.individuals:
... 		print person.source
#  Source(1, 'SOUR', '@S-357352754@', [Page(2, 'PAGE', 'Ancestry Family Tree'), Data(2, 'DATA', [Reference(3, 'TEXT', 'http://trees.ancestry.com/pt/AMTCitationRedir.aspx?tid=12345678&pid=21')])])
>>>for person in gedfile.individuals:
... 		print person.source.page
... 		print person.source.data
... 		print person.source.data.text
# Ancestry Family Tree
# Data(2, 'DATA', [Reference(3, 'TEXT', 'http://trees.ancestry.com/pt/AMTCitationRedir.aspx?tid=12345678&pid=21')])
# http://trees.ancestry.com/pt/AMTCitationRedir.aspx?tid=12345678&pid=21
current available use cases
source.value
source.page
source.data
source.data.text

Error Handling

By default, if a record doesn't exist an error will be raised and will not continue onto the rest of the records. This is on purpose, but can by bypassed by using try/except cases. The most common errors that are raised are IndexError and AttributeError

>>> for person in gedfile.individuals:
...     try:
...         print person.birth.place
...     except AttributeError:
...        print "There is no birth place record for this person"
# There is no birth place record for this person
# Brooklyn, New York City, New York, USA
>>> for family in gedfile.families:
...     try:
...         print family.marriage
...     except IndexError as e:
...         print "no record: ", e
# Marriage(1, 'MARR', [Element(2, 'DATE', '08 Aug 1854')])
# no record: IndexError: list index out of range
# Marriage(1, 'MARR', [Element(2, 'DATE', '1954')])

Dates

Dates are user input and can vary wildly in formatting. There are also approximate dates that cannot be formatted. These approximate dates can be stripped out using re or just str.replace()

Using pythons datetime library (specifically strftime & strptime. the dates available can be formatted by looping through various date formats using try/except.

>>> dateFormats = ['%m/%d/%Y', '%m-%d-%Y', '%d-%m-%Y', '%d %b %Y'] #just a few examples
>>> for person in filename.individuals:
...     for i in dateFormat:
...         try:
...             print datetime.strptime(person.birth.date, i)
...         except ValueError: # ValueError will be thrown when the date given does not match the formatting provided from the dateFormat list
...             pass

To discover more dates add a counter and increment as it passes through the dateFormat list. If the counter is higher than the length of the list -1 raise an exception printing the date that broke the program.

Contributing

Run all unitttests with tox.

About

Python library to parse and work with GEDCOM (geneology/family tree) files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%