Handle Polymorphic Array Serialization With Marshmallow Python

Sep 18th, 2020 - written by Kimserey with .

Few months ago we looked into Marshmallow, a Python serialisation and validation framework which can be used to translate Flask request data to SQLAlchemy model and vice versa. In today’s post we will look at how we can serialise an array containing polymorphic data.

Polymorphic Array

Thanks to the flexibility of Python, it is common practice to hold different classes on the same array. Those objects could be from a derived class; for example if we had a base class Notification class where the array would contain NotificationX and NotificationY.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
class Notification:
    def __init__(self, id: int):
        self.id = id


class NotifcationUserCreated(Notification):
    def __init__(self, id: int, username: str):
        super().__init__(id)
        self.username = username

    def __repr__(self):
        return "<NotifcationUserCreated id={}, username={}".format(
            self.id, self.username
        )


class NotifcationQuantityUpdated(Notification):
    def __init__(self, id: int, quantity: int):
        super().__init__(id)
        self.quantity = quantity

    def __repr__(self):
        return "<NotifcationQuantityUpdated id={}, quantity={}".format(
            self.id, self.quantity
        )

We can then have an array composed of any notification:

1
2
3
4
5
6
7
8
9
from faker import Faker

fake = Faker()

notifications = [
    NotifcationUserCreated(1, fake.name()),
    NotifcationQuantityUpdated(2, 10),
    NotifcationUserCreated(3, fake.name()),
]

will result in the following notifications:

1
2
3
[<NotifcationUserCreated id=1, username=Lauren Lambert,
 <NotifcationQuantityUpdated id=2, quantity=10,
 <NotifcationUserCreated id=3, username=David Woods]

As we can see, we are able to mix multiple classes into the array.

Marshmallow Schemas

In order to serialize the notifications, we can create their Marshmallow schemas:

1
2
3
4
5
6
7
8
9
10
11
12
from marshmallow import Schema
from marshmallow.fields import Int, Str


class NotifcationUserCreatedSchema(Schema):
    id = Int()
    username = Str()


class NotifcationQuantityUpdatedSchema(Schema):
    id = Int()
    quantity = Int()

But as we can see, if we try to dump using one schema, we would lose the information for notifications of the other type:

1
2
schema = NotifcationUserCreatedSchema(many=True)
schema.dump(notifications)

would result in:

1
2
3
[{'username': 'Lori Jackson', 'id': 1},
 {'id': 2},
 {'username': 'Samantha Clark', 'id': 3}]

Since the array contains polymorphic data, we need a way to use NotifcationUserCreatedSchema when the object is a user created notification, and use the other schema when the object is of the other type.

Polymorphic Serialization

In order to handle the selection of the right schema, we’ll use a type_map attribute which will map from the notification type to the schema.

We first start by creating notification_type attributes on the classes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class NotifcationUserCreated(Notification):
    notification_type = "user_created"

    def __init__(self, id: int, username: str):
        super().__init__(id)
        self.username = username

    def __repr__(self):
        return "<NotifcationUserCreated id={}, username={}".format(
            self.id, self.username
        )


class NotifcationQuantityUpdated(Notification):
    notification_type = "quantity_updated"

    def __init__(self, id: int, quantity: int):
        super().__init__(id)
        self.quantity = quantity

    def __repr__(self):
        return "<NotifcationQuantityUpdated id={}, quantity={}".format(
            self.id, self.quantity
        )

Then we create a NotificationSchema which holds a type_map mapping from the notification_type to the class schemas NotifcationUserCreatedSchema and NotifcationQuantityUpdatedSchema.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
class NotificationSchema(Schema):
    """Notification schema."""

    type_map = {
        "user_created": NotifcationUserCreatedSchema,
        "quantity_updated": NotifcationQuantityUpdatedSchema,
    }

    def dump(self, obj: typing.Any, *, many: bool = None):
        result = []
        errors = {}
        many = self.many if many is None else bool(many)

        if not many:
            return self._dump(obj)

        for idx, value in enumerate(obj):
            try:
                res = self._dump(value)
                result.append(res)

            except ValidationError as error:
                errors[idx] = error.normalized_messages()
                result.append(error.valid_data)

        if errors:
            raise ValidationError(errors, data=obj, valid_data=result)

        return result

    def _dump(self, obj: typing.Any):
        notification_type = getattr(obj, "notification_type")
        inner_schema = NotificationSchema.type_map.get(notification_type)

        if inner_schema is None:
            raise ValidationError(f"Missing schema for '{notification_type}'")

        return inner_schema().dump(obj)

The NotificationSchema acts as the parent schema which will use the proper schema to dump the object. We override the original dump function def dump(self, obj: typing.Any, *, many: bool = None) and within it, we use the type map to instantiate the right schema use dump from that schema.

A special scenario to handle is when given many=True, the object is expected to be an array which we enumerate and consolidate the validation errors - which should only be missing schema validation errors (as dump doesn’t run validation only load).

And using this schema we can now serialize the notifications:

1
2
schema = NotificationSchema(many=True)
schema.dump(notifications)

which will result in:

1
2
3
[{'username': 'Anthony Montgomery', 'id': 1},
 {'id': 2, 'quantity': 10},
 {'username': 'Mr. Andrew Carter', 'id': 3}]

And we can see that we are able to serialize each notification properly!

Conclusion

Today we looked into serializing a polymorphic array, we started by creating a polymorphic structure example with Notifications. We then saw how to create associated Marshmallow schemas for it. And finally we looked at how we could override Marshmallow schema dump to serialize properly the array. I hope you liked this post and I see you on the next one!

Designed, built and maintained by Kimserey Lam.