Declarative xml to object mapper
lxml 4.3
git clone https://github.com/S-Hanin/PyXmlMapper.git
cd PyXmlMapper && python setup.py install
or just add it to your requirements.txt
git+https://github.com/S-Hanin/PyXmlMapper.git#egg=pyxmlmapper
xml example
<aw:PurchaseOrder
aw:PurchaseOrderNumber="99503"
aw:OrderDate="1999-10-20"
xmlns:aw="http://www.adventure-works.com">
<aw:Address aw:Type="Shipping">
<aw:Name>Ellen Adams</aw:Name>
<aw:Street>123 Maple Street</aw:Street>
<aw:City>Mill Valley</aw:City>
<aw:State>CA</aw:State>
<aw:Zip>10999</aw:Zip>
<aw:Country>USA</aw:Country>
</aw:Address>
<aw:Address aw:Type="Billing">
<aw:Name>Tai Yee</aw:Name>
<aw:Street>8 Oak Avenue</aw:Street>
<aw:City>Old Town</aw:City>
<aw:State>PA</aw:State>
<aw:Zip>95819</aw:Zip>
<aw:Country>USA</aw:Country>
</aw:Address>
<aw:DeliveryNotes>Please leave packages in shed by driveway.</aw:DeliveryNotes>
<aw:Items>
<aw:Item aw:PartNumber="872-AA">
<aw:ProductName>Lawnmower</aw:ProductName>
<aw:Quantity>1</aw:Quantity>
<aw:USPrice>148.95</aw:USPrice>
<aw:Comment>Confirm this is electric</aw:Comment>
</aw:Item>
<aw:Item aw:PartNumber="926-AA">
<aw:ProductName>Baby Monitor</aw:ProductName>
<aw:Quantity>2</aw:Quantity>
<aw:USPrice>39.98</aw:USPrice>
<aw:ShipDate>1999-05-21</aw:ShipDate>
</aw:Item>
</aw:Items>
</aw:PurchaseOrder>
define classes for mapping
from pyxmlmapper import base
from pyxmlmapper import fields
from typing import List
class Item(base.BaseXmlParser):
__namespaces__ = {'aw': 'http://www.adventure-works.com'}
product_name = fields.ValueField("aw:ProductName")
part_number = fields.ValueField("@aw:PartNumber")
quantity = fields.ValueField("aw:Quantity", pytype=int)
us_price = fields.ValueField("aw:USPrice", pytype=float)
class Address(base.BaseXmlParser):
name = fields.ValueField("aw:Name")
city = fields.ValueField("aw:City")
class PurchaseOrder(base.BaseXmlParser):
order_date = fields.DateTimeField('@aw:OrderDate')
address_shipping: Address = fields.ObjectField("aw:Address[@aw:Type='Shipping']", Address)
address_billing: Address = fields.ObjectField("aw:Address[@aw:Type='Billing']", Address)
delivery_notes = fields.ValueField("aw:DeliveryNotes")
items: List[Item] = fields.ListObjectField(".//aw:Item", Item)
purchase_order = PurchaseOrder(xml) # xml - string or xml tree
print(purchase_order.address_billing.name)
print(purchase_order.delivery_notes)
for item in purchase_order.items:
print(item.product_name)
Note that namespaces declaration is not necessary
It's worth noting that mapper works lazily and a query is executed at the moment you are querying the field's value
If you have an XML file example it's possible to generate models using scripts xml2class.py or xml2class_bulk.py.
xml2class.py filename.xml > filename.py
or
xml2class_bulk.py [-z] in_dir out_dir
Option -z allows to process zip archives
Import ways:
from pyxmlmapper import base
from pyxmlmapper.components.fields import *
from pyxmlmapper import fields
Field definition:
from pyxmlmapper import fields
xml_field = fields.XmlField(query='', pytype=str, default='', strict=False)
XmlField
is a base for other field types.
query
- XPath querypytype
- type to convert extracted value to, basically there might be any callable which accepts string and returns some valuedefault
- default value if nothing foundstrict
- boolean. Indicates that xml field is mandatory. If True and nothing found thenNotFoundException
will be raised
ValueField
- represents xml node without children. ( Returns the first found if there are more than one field )
ListValueField
- represents xml nodes which have the same name and have no children
ObjectField
- represents xml node with children. ( Returns the first found if there are more than one field )
ListObjectField
- represents xml nodes which have the same name and have children
DateTimeField
- special field for datetime values, there is python-dateutil under the hood.
from pyxmlmapper import fields
from datetime import datetime
date = fields.DateTimeField(query='', default=datetime(2021, 1, 1), dayfirst=False, yearfirst=False, fuzzy=True)
query
- XPath querydefault
- default value to return if xml value can't be parseddayfirst
- in the case when date is ambiguous like '01-01-2021'yearfirst
- in the case when date is ambiguous like '21-01-01'fuzzy
- allow dateutil fuzzy parsing
To avoid prefix declaration in xpath it's possible to use additional xpath functions:
from pyxmlmapper import fields
product_name = fields.ValueField(".//*[tag()='ProductName']")
product_name = fields.ValueField(".//*[match(tag(), 'ProductName')]")
product_name = fields.ValueField(".//*[match(tag(), 'ProductName', 'product_name')]") # if tag name is not permanent
or you can add your own xpath functions
from pyxmlmapper import xpath_functions
@xpath_functions.ns
def lower(context, a):
"""lower-case() function for XPath 1.0"""
return a.lower()
and use it
from pyxmlmapper import fields
product_name = fields.ValueField(".//*[lower(tag())='productname']")
more about xpath functions: https://lxml.de/extensions.html#xpath-extension-functions