Customizable Serialization for Django.
Author: Tom Christie, Follow me on Twitter.
django-serializers provides flexible serialization of objects, models and querysets.
It is intended to be a potential replacement for the current, inflexible
serialization. It should be able to support the current dumpdata
format,
whilst also being easy to override and customise.
Serializers are declared in a simlar format to Form
and Model
declarations,
with an inner Meta
class providing general options, and optionally with a set
of Field
classes being declaring inside the Serializer
class.
The Serializer
class itself also implements the Field
interface, meaning we
can represent serialization of nested instances in various different ways.
Features:
- Supports serialization of arbitrary python objects using the
Serializer
class. - Supports serialization of models and querysets using
ModelSerializer
. - Supports serialization to the existing dumpdata format, using
DumpDataSerializer
. - Supports flat serialization, and nested serialization (to arbitrary depth), and handles recursive relationships.
- Allows for both implicit fields, which are determined at the point of serialization, and explicit fields, which are declared on the serializer class.
- The declaration of the serialization structure is handled independantly of the final encoding used (eg 'json', 'xml' etc…). This is desirable for eg. APIs which want to support a given dataset being output to a number of different formats.
- Currently supports 'json', 'yaml', 'xml', 'csv'.
- Supports both fields that corrospond to Django model fields, and fields that corrospond to other attributes, such as
get_absolute_url
. - Supports relations serializing to primary keys, natural keys, or custom implementations.
- Hooks throughout to allow for complete customization. Eg. Writing key names using javascript style camel casing.
- Simple, clean API.
- Comprehensive test suite.
Still to do:
- Tests for non-numeric FKs, and FKs with a custom db implementation.
- Tests for many2many FKs with a 'through' model.
- Tests for proxy models.
- Finish off xml dumpdata backward compat - many to many, natural keys, None & None on datetime fields all need tweaking.
- Default xml renderer needs to include attributes, not just the dumpdata one.
django-serializers
currently does not address deserialization. Replacing the existingloaddata
deserialization with a more flexible deserialization API is considered out of scope, until the serialization API has first been adequatly addressed.django-serializers
current does not provide an API that is backwards compatible with the existingdumpdata
serializers. Need to consider if this is a requirement. Eg. would this be a replacement to the existing serializers, or an addition to them?- source='*' should have the effect of passing through
fields
,include
,exclude
to the child field, instead of applying to the parent serializer, so eg. DumpDataSerializer will recognise that those arguments apply to thefields:
level, rather than referring to what should be included at the root level. - streaming output, rather than loading all the data into memory.
- Consider character encoding issues.
stack
needs to be reverted at start of new serialization.- Performance testing.
- indent option for xml
Nice to have:
- I'd like to add
nested.field
syntax to theinclude
,exclude
andfield
argument, to allow quick declarations of nested representations. - Add
nested.field
syntax to thesource
argument, to allow quick declarations of serializing nested elements into a flat output structure. - Better
csv
format. (Eg nested fields)
Done:
- Add hooks to control which types of model field get serialized by default. (eg base fields, m2m fields etc…)
- Add simple hooks for which field classes should be used by default. (Eg
flat_field=
,nested_field=
attributes inSerializer.Meta
) - Respect
serialize
property on model fields. - Handle multiple model inheritance correctly for ModelSerializer and DumpDataSerializer.
- The base
Field
instances need to be copied onSerializer
instatiation. Right now there's some shared state that needs to disappear. - Add natural key support to DumpDataSerializer.
- Remove ordered keys / unordered keys from public interface. Always on for ModelSerializer, always off for DumpDataSerializer.
- Fixup KeyWithMetadata - use SortedDictWithMetadata instead.
Install using pip:
pip install django-serializers
Optionally, if you want to include the django-serializer
tests in your
project, add serializers
to your INSTALLED_APPS
setting:
INSTALLED_APPS = (
...
'seriliazers',
)
Note that if you have cloned the git repo you can run the tests directly, with
the provided manage.py
file:
manage.py test
We'll use the following example class to show some simple examples of serialization:
class Person(object):
def __init__(self, first_name, last_name, age):
self.first_name = first_name
self.last_name = last_name
self.age = age
@property
def full_name(self):
return self.first_name + ' ' + self.last_name
You can serialize arbitrary objects using the Serializer
class. Objects are
serialized into dictionaries, containing key value pairs of any non-private
instance attributes on the object:
>>> from serializers import Serializer
>>> person = Person('john', 'doe', 42)
>>> serializer = Serializer()
>>> print serializer.encode(person, 'json', indent=4)
{
'first_name': 'john',
'last_name': 'doe',
'age': 42
}
Let's say we only want to include some specific fields. We can do so either by
setting those fields when we instantiate the Serializer
...
>>> serializer = Serializer(fields=('first_name', 'age'))
>>> print serializer.encode(person, 'json', indent=4)
{
'first_name': 'john',
'age': 42
}
...Or by defining a custom Serializer
class:
>>> class PersonSerializer(Serializer):
>>> class Meta:
>>> fields = ('first_name', 'age')
>>>
>>> print PersonSerializer().encode(person, 'json', indent=4)
{
'first_name': 'john',
'age': 42
}
We can also include additional attributes on the object to be serialized, or exclude existing attributes:
>>> class PersonSerializer(Serializer):
>>> class Meta:
>>> exclude = ('first_name', 'last_name')
>>> include = 'full_name'
>>>
>>> print PersonSerializer().encode(person, 'json', indent=4)
{
'full_name': 'john doe',
'age': 42
}
To explicitly define how the object fields should be serialized, we declare those fields on the serializer class:
>>> class PersonSerializer(Serializer):
>>> first_name = Field(label='First name')
>>> last_name = Field(label='Last name')
>>>
>>> print PersonSerializer().encode(person, 'json', indent=4)
{
'First name': 'john',
'Last name': 'doe'
}
We can also define new types of field and control how they are serialized:
>>> class ClassNameField(Field):
>>> def serialize(self, obj)
>>> return obj.__class__.__name__
>>>
>>> def get_field_value(self, obj, field_name):
>>> return obj
>>>
>>> class ObjectSerializer(Serializer):
>>> class_name = ClassNameField(label='class')
>>> fields = Serializer(source='*')
>>>
>>> print ObjectSerializer().encode(person, 'json', indent=4)
{
'class': 'Person',
'fields': {
'first_name': 'john',
'last_name': 'doe',
'age': 42
}
}
django-serializers also handles nested serialization of objects:
>>> fred = Person('fred', 'bloggs', 41)
>>> emily = Person('emily', 'doe', 37)
>>> jane = Person('jane', 'doe', 44, partner=fred)
>>> john = Person('john', 'doe', 42, siblings=[jane, emily])
>>> Serializer().serialize(john)
{
'first_name': 'john',
'last_name': 'doe',
'age': 42,
'siblings': [
{
'first_name': 'jane',
'last_name': 'doe',
'age': 44,
'partner': {
'first_name': 'fred',
'last_name': 'bloggs',
'age': 41,
}
},
{
'first_name': 'emily',
'last_name': 'doe',
'age': 37,
}
]
}
And handles flat serialization of objects:
>>> Serializer(depth=0).serialize(john)
{
'first_name': 'john',
'last_name': 'doe',
'age': 42,
'siblings': [
'jane doe',
'emily doe'
]
}
Similarly model and queryset serialization is supported, and handles either flat or nested serialization of foreign keys, many to many relationships, and one to one relationships, plus reverse relationships:
>>> class User(models.Model):
>>> email = models.EmailField()
>>>
>>> class Profile(models.Model):
>>> user = models.OneToOneField(User, related_name='profile')
>>> country_of_birth = models.CharField(max_length=100)
>>> date_of_birth = models.DateTimeField()
>>>
>>> ModelSerializer().serialize(profile)
{
'id': 1,
'user': {
'id': 1,
'email': 'joe@example.com'
},
'country_of_birth': 'UK',
'date_of_birth': datetime.datetime(day=5, month=4, year=1979)
}
The existing dumpdata format is (mostly) replicated, and gives a good example of how to declare custom serialization styles:
>>> class DumpDataSerializer(ModelSerializer):
>>> pk = ModelField()
>>> model = ModelNameField()
>>> fields = ModelSerializer(source='*', exclude='id', depth=0)
If label
is set it determines the name that should be used as the
key when serializing the field.
If source
is set it determines which attribute of the object to
retrieve when serializing the field.
A value of '*' is a special case, which denotes the entire object should be passed through and serialized by this field.
For example, the following serializer:
class ClassNameField(Field):
def serialize(self, obj):
return obj.__class__.__name__
def get_field_value(self, obj, field_name):
return obj
class CustomSerializer(Serializer):
class_name = ClassNameField(label='class')
fields = Serializer(source='*', depth=0)
Would serialize objects into a structure like this:
{
"class": "Person"
"fields": {
"age": 23,
"name": "Frank"
...
},
}
Provides a simple way to override the default serialization function.
serialize
should be a function that takes a single argument and returns
the serialized output.
For example:
class CustomSerializer(Serializer):
email = Field(serialize=lamda obj: obj.lower()) # Force email fields to lowercase.
...
Serializer options may be specified in the class definition, on the Meta
inner class, or set when instatiating the Serializer
object.
For example, using the Meta
inner class:
class PersonSerializer(Serializer):
class Meta:
fields = ('full_name', 'age')
serializer = PersonSerializer()
And the same, using arguments when instantiating the serializer.
serializer = Serializer(fields=('full_name', 'age'))
The serializer class is a subclass of Field
, so also supports the Field
API.
A list of field names that should be included in the output. This could include properties, class attributes, or any other attribute on the object that would not otherwise be serialized.
A list of field names that should not be included in the output.
The complete list of field names that should be serialized. If provided
fields
will override include
and exclude
.
The depth
argument controls how nested objects should be serialized.
The default is None
, which means serialization should descend into nested
objects.
If depth
is set to an integer value, serialization will descend that many
levels into nested objects, before starting serialize nested models with a
"flat" value.
For example, setting depth=0
ensures that only the fields of the top level
object will be serialized, and any nested objects will simply be serialized
as simple string representations of those objects.
The default set of fields on an object are the attributes that will be serialized if no serializer fields are explicitly specified on the class.
When serializer fields are explicitly specified, these will normally be used instead of the default fields.
If include_default_fields
is set to True
, then both the explicitly
specified serializer fields and the object's default fields will be used.
For example, in this case, only the 'full_name' field will be serialized:
class CustomSerializer(Serializer):
full_name = Serializer(label='Full name')
In this case, both the 'full_name' field, and any instance attributes on the object will be serialized:
class CustomSerializer(Serializer):
full_name = Serializer(label='Full name')
class Meta:
include_default_fields = True
The class that should be used for serializing flat fields. (ie. Once the
specified depth
has been reached.) Default is Field
.
The class that should be used for serializing nested fields. (ie Before the
specified depth
has been reached.) Default is None
, which indicates that
the serializer should use another instance of it's own class.
The class that should be used for serializing fields when a recursion occurs.
Default is None
, which indicates that it should fall back to using a flat
field representation.
The ModelSerializer supports all the options for Serializer, as well as these additional options:
The class that should be used for serializing related model fields once the maximum depth has been reached, or recursion occurs.
related_field
can be applied to OneToOneField
, ForeignKey
,
ManyToManyField
, or any of their corrosponding reverse managers.
Default is PrimaryKeyRelatedField
.
A list of model field types that should be serialized by default. Available options are: 'pk', 'fields', 'many_to_many', 'local_fields', 'local_many_to_many'. The default value is ('pk', 'fields', 'many_to_many').
Note that the DumpDataSerializer uses a slightly different set of fields, in order to correctly deal with it's particular requirements.
Returns a native python datatype representing the given object.
If you are writing a custom field, overiding serialize()
will let
you customise how the output is generated.
Returns a native python datatype representing the given field_name
attribute on object
.
This defaults to getting the attribute from obj
using getattr
, and
calling serialize
on the result.
If you are writing a custom Field
and need to control exactly which attributes
of the object are serialized, you will need to override this method instead of
the serialize
method.
(For example if you are writing adatetime
serializer which combines
information from two seperate date
and time
attributes on an object, or
perhaps if you are writing a Field
serializer which serializes some
non-attribute aspect of the object such as it's class name)
If specified attributes()
should return a dictionary that may be used
when rendering to xml to determine the attribtues on the tag that represents
this field.
The main entry point into serializers.
format
should be a string representing the desired encoding. Valid choices
are json
, yaml
and xml
. If format is left as None
, the object will be
serialized into a python object in the desired structure, but will not be
rendered into a final output format.
opts
may be any additional options specific to the encoding.
Internally serialization is a two-step process. The first step calls the
serialize()
method, which serializes the object into the desired structure,
limited to a set of primative python datatypes. The second step calls the
render()
method, which renders that structure into the final output string
or bytestream.
Returns a native python object representing the key for the given field name.
By default this will be the serializer's label
if it has one specified,
or the field_name
string otherwise.
Return the default set of field names that should be serialized for an object.
If a serializer has no Serializer
classes declared as fields, then this will
be the set of fields names that will be serialized.
Performs the final part of the serialization, translating a simple python object into the output format.
The data
argument is provided by the return value of the
serialize()
method.
format
and **opts
are the arguments as passed through by the
encode()
method.
- Dumpdata support for json and yaml. xml nearly complete.
- Fix csv for python 2.6
- Fix import error when yaml not installed
- Initial support for CSV.
- First proper release. Properly working model relationships etc…
- Initial release
Copyright © Tom Christie.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.