Jdic aims to be a professional, ready-to-use and easy-to-learn package delivering the all-around tools you need to manipulate large JSON-like Python objects.
In most projects, advanced JSON documents manipulations require to reinvent a lot of small features and to use a set of independant external packages.
Jdic is the type-class you need to ease the manipulation of JSON-like documents, with the admitted goal to ensure that less code is written all while getting the job done fast, flawlessly and efficiently.
The Jdic class provides a lot of interesting features and embeds mission-critical 3rd party libraries all united within a comprehensive straightforward API.
Here are the useful operations Jdic can do for you:
- Transparent JSON paths integration - for read and write operations, through an agnostic driver model. Currently MongoDB and Jsonpath-NG paths formats are natively supported.
- Fast browsing of JSON documents - browse the entire data structure while getting useful values on each iteration: value, JSON path, parent, parent JSON path, depth, etc.
- Find, Find-Keys and Find-Match features for quickly finding any key or value, or subdocument matching MongoDB-like queries.
- Merge features for fusioning objects recursively, with up to 4 modes for handling conflicting arrays (replace, merge, new, append).
- Diff & Patch features - so you can represent differences between two documents in small data sets (diff), and apply those differences to update documents (patch).
- JSON Schema validation - if you need it, with auto-validation on each document change.
- Consistent document checksuming - natively SHA-256, it allows to get a single checksum for the document, the checksum will always be the same on all systems.
- Depth features - you can crawl your document at certain depths only.
- Strict features - input data will be serialized to a strict JSON format.
- Agnostic dicts/lists replacement for Python's
enumerate()
- Custom input serializer support - convert specific objects to the JSON data you want to.
- Cache features with change detection to accelerate some of the API calls.
from jdic import jdic
o = {"a" : {"b" : {"c" : 1}}}
j = jdic(o) # Accepts dicts and lists (or any Mapping, Sequence)
paths = [m.path for m in j.find(1)] # Find path for a value as-is
>>> ["a.b.c"]
for match in j.find(1): # Classic loop format
print(match.path)
>>> a.b.c
paths = [m.parent_path for m in j.leaves()] # Results include parents paths and more
>>> ["a.b"]
allitems = [m.value for m in j.browse()] # The first item is always the root itself
>>> [
{"a": {"b": {"c": 1}}},
{"b": {"c": 1}},
{"c": 1},
1
]
paths = [m.depth for m in j.find({"c" : 1})] # find() target values can be objects
>>> [2]
paths = [m.path for m in j.find_match({"c": {"$gt": 0}}) ] # Support of Mongo-like queries
>>> ["a.b"]
paths = [m.path for m in j.find_match({ # Complex Mongo-like queries are permitted
"$and" : [
{"b.c": {"$exists" : True}},
{"b.d": {"$exists" : False}}
]
})]
>>> ["a"]
j.checksum()
>>> ebd240a9ae435649514086d13c20d9963ec2844a1f866b313919c55a7c3f7ccb # Is consistent on all systems
j["a"].checksum() # Sub-iterables have Jdic methods / all sub-iterables implement their own checksum()
>>> 05a2013fbe17af7d58779ed96e0d74bd6fa3ce2726c1ebbd9f7dc33671b1c28e
j["a"] = None
j.checksum()
>>> 69d7d33051c5e05aa72f55a9a8e30a73da8d4afaa37127b9ea7ee29403aa9d3f # Change detection from child to parent
j = jdic(o)
p = {"a" : {"e" : {"f" : -1 }}}
diff = j.diff(p)
>>> [[["a"], {"e": {"f": -1}}]] # A diff stanza - on larger documents the diffs are smaller than documents
j = j.patch(diff) # Patch does not modify the original object but returns a patched version
j == p # Jdic objects can be transparently compared with dict or list objects (or equivalents)
>>> True
q = {"a" : {"b" : {"d" : 2}}}
j.merge(q)
>>> {"a": {"b": {"c": 1, "d": 2}}} # Handles recursive merge
j = jdic(o, schema = {'type' : 'object' , 'properties' : {'a' : {'type' : 'object'}}}) # Correct Schema
j['a'] = 3 # instant detection of schema violation (exception)
>>> Traceback (most recent call last): ...
from jdic import Jdic # `Jdic` is the parent class, do not confuse with the function `jdic()`
>>> type(j)
<class 'jdic.JdicMapping'>
>>> isinstance(j, Jdic)
True
# Agnostic enumerations with a revised enumerate() function
from jdic import enumerate
y, z = [1,2,3], {'a':1, 'b':2}
for k, v in enumerate(y): # Acts just as the original enumerate() for lists
y[k] = v
for k, v in enumerate(z): # But allows dict enumeration, raising a key instead of a counter
z[k] = v
j = jdic({'a' : [{'b': 1}, {'b': 2}, {'b': 3}]}, driver = 'jsonpath_ng')
j['a[*].b'] = 0 # Reassign the value to all locations at once!
>>> {"a": [{"b": 0}, {"b": 0}, {"b": 0}]}
del('a[*].b') # Also works with del()
>>> {"a": [{}, {}, {}]}
The MatchResult object is returned for most search operations. It contains:
value
: the found objectpath
: the full JSON path of the found objectparent
: the Jdic object containing the current found objectparent_path
: the full JSON path of the parent of the found objectkey
: the current key (or index) at which the object was found within its parentdepth
: the depth of the object counting from the root of the Jdic
Instantiations of Jdic objects is made through the jdic()
function which will decide for the type of Jdic object (JdicMapping
or JdicSequence
) to instantiate and return. Both those types inherit from the Jdic class (do not use this one directly, mind the lowercase).
obj
: any list or dictionary. Sequence and Mapping equivalents will be casted tolist
anddict
.schema
: optional, must be a JSON Schema in the form of adict
. If provided, all changes affecting the Jdic will be validated against the schema whenever they happen.serializer
: optional, your custom serialization function. Useless whenobj
is the result of ajson.loads()
. It will be called to transform non-standard object types into standard JSON types. If not provided, exotic types are transformed tostr
. It is possible to usesettings.serialize_custom_function
instead, to globally specify a serializing function for all the Jdic instances. A serializer specified as argument will always have priority over settings. The custom serializer function, if used, must return a JSON compliant data type: None, bool, str, int, float, list, dict.driver
: optional, a string representing the driver to use (mongo
andjsonpath_ng
are natively implemented). It is possible to usesettings.json_path_driver
instead, to globally specify a driver. Drivers specified as argument will have priority over settings.
Note about floating point values: objects serialized as Jdic objects will have their floating values transformed to integers whenever the float value is equal to its integer form. This is to make the JSON dumps and checksums consistent and avoids '5' to be shown as '5.0'. This can be changed by setting settings.serialize_float_to_int
to False
.
Recurse on all Jdic elements, yielding a MatchResult
object on each iteration.
sort
: if True all the results will be yielded with JSON paths in alphabetical order.depth
: an integer - only the results from objects at depth will be yielded.maxdepth
: an integer - will not recurse on documents whose depth is abovemaxdepth
.
Returns an ASCII checksum representing the content and data types of the object. Checksums are consistent from an execution to another and can be safely used for content change detection or objects comparisons. The checksum is cached and is only recalculated if changes occured.
algo
: any algorithm supported by thehashlib
Python library
Returns an integer representing the deepness of the JSON structure from where deepness()
is called. A document with no dict or list within it has a deepness of zero. The deepness is cached and is only recalculated if changes occured.
Returns an integer representing the depth of the current document from the root of the Jdic object. The depth of the root document is 0.
Returns an object (a diff stanza) representing the differences between the Jdic and obj
. diff()
is implemented by the json_delta
Python library.
obj
: any data
Agnostic and non-recursive enumeration of each entry in the current object. It yields a (k, v)
tuple, where k
is either an integer index when object is a list, and a string key when object is a dict. v
is always the value. enumerate()
is also available as a standalone function within the Jdic package: from jdic import enumerate
.
sort
: if True, sorts the dictionary keys alphabetically. Only sort dictionary keys, not lists.
Searches a value within the entire Jdic. Searches are strict (==
).
value
: the value to search for - can be a simple type (int, str, etc.) or complex object (list, dict, Jdic, etc.)sort
: if True the search results will be sorted with JSON paths in alphabetical order.limit
: an integer - terminates the search when the number of results reacheslimit
.depth
: an integer - only the results from objects at depth will be yielded.maxdepth
: an integer - will not recurse on documents whose depth is abovemaxdepth
.
Searches any sub-object containing keys
. keys
can be a single key or a list of keys. This function aims to facilitate finding sub-objects whose keys are known.
keys
: a string or list of strings. The search will be case sensitive. Keys are for dicts and cannot be integer indexes of arrays. Keys cannot be JSON paths.mode
:"any"
or"all"
- if"any"
then any object matching any of the provided keys will be yielded. If"all"
then any object containing all the keys will be matched.sort
: if True the search results will be sorted with JSON paths in alphabetical order.limit
: an integer - terminates the search when the number of results reacheslimit
.depth
: an integer - only the results from objects at depth will be yielded.maxdepth
: an integer - will not recurse on documents whose depth is abovemaxdepth
.
Finds all objects matching positive against query
. Queries for find_match()
are MongoDB-like queries, for both mongo
and jsonpath_ng
drivers. The underlying implementation is provided by the mongoquery
Python library.
query
: a MongoDB-like query. Please refer to the MongoDB documentation or the examples for information on queries structuration. Also review https://github.com/kapouille/mongoquery for more details onmongoquery
and its known limitations.sort
: if True the search results will be sorted with JSON paths in alphabetical order.limit
: an integer - terminates the search when the number of results reacheslimit
.depth
: an integer - only the results from objects at depth will be yielded.maxdepth
: an integer - will not recurse on documents whose depth is abovemaxdepth
.
A helper to dump Jdic objects as serialized JSON strings.
sort_keys
: all keys will be sorted alphabetically within their own dicts.indent
: number of spaces to add on new blocks.ensure_ascii
: for a pure ASCII output (usually not recommended for anything else than printing binary data).
Will yield a MatchResult
on each leaf encountered in the document. A leaf is a terminal value within the JSON documents. Basically all values are leaves, except dicts and lists.
sort
: if True the search results will be sorted with JSON paths in alphabetical order.depth
: an integer - only the results from objects at depth will be yielded.maxdepth
: an integer - will not recurse on documents whose depth is abovemaxdepth
.
Returns the number of leaves contained in the Jdic object. This information is cached and is only recalculated if changes occured.
Returns True
or False
if the current Jdic object matches the Mongo-like query. Unlike find_match()
it will not recurse into subdocuments. The current match()
implementation is supported by the mongoquery
Python library.
query
: a Mongo-like query object
Will merge the current Jdic with one or multiple other objects (dicts or lists). It is not possible to merge a Jdic of type Mapping (dict) with a Sequence (list) or vice-versa. This limitation does not apply to sub-documents, where a type replaces another. Note that, unlike patch()
, the method will change the state of the current object. If multiple args are provided then the next obj in objs
is merged on the result of the previous merge operation, allowing to chain the merges.
objs
: one or multiple objects of a similar type as the Jdic object itself.arr_mode
: determines how are handled the merging of conflicting arrays (arrays who are on the same JSON path). 4 modes are supported:"replace"
: arrays in Jdic are simply replaced."append"
: arrays fromargs
are appended to array in Jdic."new"
: elements of arrays fromargs
are appended, but only if they do not exist in the Jdic array."merge"
: a recursive merge is processed on the elements of the same index. If there are more elements inargs
arrays then those are appended in the Jdic arrays.
Returns an independant copy of the current Jdic, but inheriting its driver, schema and serializer. If the Jdic is a subdocument of another Jdic then it loses its parenthood information (detachment).
Returns the Jdic parent of the current object. The root document has no parent (None
).
generation
: changes the generation of the parent returned. Eg.2
will return the grand-parent.0
always returnsNone
.None
is also returned whengeneration
targets above the root Jdic document.
Applies a diff stanza as returned by diff()
and returns a patched version of the Jdic object, without parenthood information. The original object is not modified. The underlying implementation is provided by the json_delta
Python library.
diff
: an object returned bydiff()
.
Returns the full JSON path of the current Jdic object. Note that the JSON path format will depend of the current underlying driver in use. Eg: the root path for the mongo
driver is an empty string (""
) and "$"
with the jsonpath_ng
driver.
Returns a standalone non-Jdic object representing the JSON document. The result is a list
or dict
, depending of the type of the Jdic document (Sequence or Mapping). This function is useful for passing a Jdic in the form of pure Python basic types for compatibility purposes. The results are cached and rebuilt only if changes occured.
Validates the current Jdic with any JSON schema provided. If no argument is passed the Jdic is validated against its own schema, if it has any. Note that calling validate()
without argument is useless if the Jdic is instantiated with a schema: in such case the Jdic object is constantly validated after a change. The schema validation features are supported by the jsonschema
Python library.
schema
: a JSON schema.
By default Jdic will try to transform input floats into integers, if the integer value is equal the float value (eg: float 5.0
is changed to int 5
).
The goal is trying to avoid unpredicted behaviors in serializations operations, so we reduce the risk of detecting differences between two objects who are both semantically and mathematically identical.
If you want to globally prevent the float to int normalization it is possible to set serialize_float_to_int
to False:
from jdic import settings
settings.serialize_float_to_int = False
This will apply to all classes.
When using str()
on a Jdic object the default behavior is to return a nicely formatted JSON dump, whose keys are sorted and indentation set to 4, to ease the debugging processes and print()
operations.
If you wish to send or store this dump, casting it to string with str()
is not the proper way to do, prefer the json()
method instead.
If you want to change the behavior of the JSON dump through str()
, you can change the settings with json_dump_sort_keys
and json_dump_indent
:
from jdic import settings
settings.json_dump_sort_keys = False # Disables key sorting
settings.json_dump_indent = 0 # Disables indentation
This will apply to all classes.
By default the JSON path driver is mongo
. Changing the json_path_driver
to another value in the settings (eg: jsonpath_ng
) will change the default driver used for any future class instantiation, unless otherwise specified in jdic()
parameters:
from jdic import settings
settings.json_path_driver = "jsonpath_ng"
- Create a new folder within your project (eg:
new_driver
) - Create a
__init__.py
file within the folder
First, you should review an already implemented driver. The mongo
driver is the best example you can use so far. It is available within the jdic module in drivers/mongo/__init__.py
.
The __init__.py
file must contain a Driver
class whose template is:
class Driver(object):
"""The driver class"""
@classmethod
def add_to_path(cls, path, key):
"""Adds a key at the end of a JSON path and returns the new path"""
@classmethod
def control_invalid_key(cls, key):
""" Raises an exception if a key format (not JSON path) is not valid """
@staticmethod
def get_new_path():
"""Returns a static JSON path pointing to the root of document"""
@classmethod
def get_parent(cls, obj, path):
"""Returns the parent of the value pointed by JSON path"""
@classmethod
def get_value_at_path(cls, obj, path):
"""Returns the value pointed by JSON path"""
@staticmethod
def is_a_path(key):
"""True if is a JSON path, else False"""
@classmethod
def is_root_path(cls, path):
"""True if is a JSON path for root document, else False"""
@staticmethod
def keys_to_path(keys):
"""Transforms a list of keys into a proper JSON path"""
@staticmethod
def match(obj, query):
"""Returns True if object matches the query, else False"""
@staticmethod
def path_to_keys(path):
"""Transforms an expression-less JSON path into a series of keys"""
Note that if you wish to benefit from already implemented functions, you can inherit from any existing driver. For example, the current class implementation of the jsonpath-ng
driver inherits from the Mongo driver allowing to reimplement only the relevant features, explaining why the match()
function is still implemented to match against Mongo Query Language queries supported by the mongo
driver.
class Driver(jdic.drivers.mongo.Driver):
...
json_delta: http://json-delta.readthedocs.io/en/latest/
jsonschema: https://github.com/Julian/jsonschema
mongoquery: https://github.com/kapouille/mongoquery
jsonpath_ng: https://github.com/h2non/jsonpath-ng
- Readthedocs documentation
- More tests (current state: 108 assertions)