Skip to content

atbrox/xmlutils.py

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

xmlutils.py

A set of Python scripts for processing xml files serially, namely converting them to other formats (SQL, CSV). The scripts use ElementTree.iterparse() to iterate through nodes in an XML file, thus not needing to load the whole DOM into memory. The scripts can be used to churn through large XML files (albeit taking long :P) without memory hiccups. (Note: The XML files are NOT validated by the scripts.)

Kailash Nadh, October 2011

License: MIT License

Documentation: http://kailashnadh.name/code/xmlutils.py

xml2csv.py

Convert an XML document to a CSV file.

python xml2csv.py --input "samples/fruits.xml" --output "samples/fruits.csv" --tag "item"

options

--input Input XML document's filename*
--output Output CSV file's filename*
--tag The tag of the node that represents a single record (Eg: item, record)*
--delimiter Delimiter for seperating items in a row. Default is , (a comma followed by a space)
--ignore A space separated list of element tags in the XML document to ignore.
--header Whether to print the CSV header (list of fields) in the first line; 1=yes, 0=no. Default is 1.
--encoding Character encoding of the document. Default is utf-8
--limit Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--buffer The number of records to be kept in memory before it is written to the output CSV file. Helps reduce the number of disk writes. Default is 1000.

##xml2sql.py Convert an XML document to an SQL file.

python xml2sql.py --input "samples/fruits.xml" --output "samples/fruits.sql" --tag "item" --table "myfruits"

##options

--input Input XML document's filename*
--output Output SQL file's filename*
--tag The tag of the node that represents a single record (Eg: item, record)*
--ignore A space separated list of element tags in the XML document to ignore.
--encoding Character encoding of the document. Default is utf-8
--limit Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--packet Maximum size of a single INSERT query in MBs. Default is 8. Set based on MySQL's max_allowed_packet configuration.

Releases

No releases published

Packages

No packages published