Extended dict-like subclass to support casting and JSON dumping without extras

Extended dict-like subclass to support casting and JSON dumping without extras



I need to create an instance t of a dict-like class T that supports
both "casting" to a real dict with dict(**t), without reverting to doing
dict([(k, v) for k, v in t.items()]). As well as supports dumping as
JSON using the standard json library, without extending the normal JSON
Encoder (i.e. no function provided for the default parameter).


t


T


dict(**t)


dict([(k, v) for k, v in t.items()])


json


default



With t being a normal dict, both work:


t


dict


import json

def dump(data):
print(list(data.items()))
try:
print('cast:', dict(**data))
except Exception as e:
print('ERROR:', e)
try:
print('json:', json.dumps(data))
except Exception as e:
print('ERROR:', e)

t = dict(a=1, b=2)
dump(t)



printing:


[('a', 1), ('b', 2)]
cast: 'a': 1, 'b': 2
json: "a": 1, "b": 2



However I want t to be an instance of the class T that adds e.g. a
key default "on the fly" to its items, so no inserting up-front is possible (actually I want merged keys
from one or more instances of T to show up, this is a simplification of that real,
much more complex, class).


t


T


default


class T(dict):
def __getitem__(self, key):
if key == 'default':
return 'DEFAULT'
return dict.__getitem__(self, key)

def items(self):
for k in dict.keys(self):
yield k, self[k]
yield 'default', self['default']

def keys(self):
for k in dict.keys(self):
yield k
yield 'default'

t = T(a=1, b=2)
dump(t)



this gives:


[('a', 1), ('b', 2), ('default', 'DEFAULT')]
cast: 'a': 1, 'b': 2
json: "a": 1, "b": 2, "default": "DEFAULT"



and the cast doesn't work properly because there is no key 'default',
and I don't know which "magic" function to provide to make casting
work.



When I build T upon the functionality that collections.abc implements, and provide the
required abstract methods in the subclass, casting works:


T


collections.abc


from collections.abc import MutableMapping

class TIter:
def __init__(self, t):
self.keys = list(t.d.keys()) + ['default']
self.index = 0

def __next__(self):
if self.index == len(self.keys):
raise StopIteration
res = self.keys[self.index]
self.index += 1
return res

class T(MutableMapping):
def __init__(self, **kw):
self.d = dict(**kw)

def __delitem__(self, key):
if key != 'default':
del self.d[key]

def __len__(self):
return len(self.d) + 1

def __setitem__(self, key, v):
if key != 'default':
self.d[key] = v

def __getitem__(self, key):
if key == 'default':
return 'DEFAULT'
# return None
return self.d[key]

def __iter__(self):
return TIter(self)

t = T(a=1, b=2)
dump(t)



which gives:


[('a', 1), ('b', 2), ('default', 'DEFAULT')]
cast: 'a': 1, 'b': 2, 'default': 'DEFAULT'
ERROR: Object of type 'T' is not JSON serializable



The JSON dumping fails because that dumper cannot handle
MutableMapping subclasses, it explicitly tests on the C level using PyDict_Check.


MutableMapping


PyDict_Check



When I tried to make T a subclass of both dict and
MutableMapping, I did get the same result as when using only
the dict subclass.


T


dict


MutableMapping


dict



I can of course consider it a bug that the json dumper has not
been updated to assume that (concrete subclasses of)
collections.abc.Mapping are dumpable. But even if it is acknowledged
as a bug and gets fixed in some future version of Python, I don't think
such a fix will be applied to older versions of Python.


json


collections.abc.Mapping



Q1: How can I make the T implementation that is a subclass of
dict, to cast properly?
Q2: If Q1 doesn't have an answer, would it
work if I make a C level class that returns the right value for
PyDict_Check but doesn't do any of the actual implementation (and
then make T a subclass of that as well as MutableMapping (I don't
think adding such an incomplete C level dict will work, but I haven't
tried), and would this fool json.dumps()?
Q3 Is this a complete wrong approach to get both to work like the first example?


T


dict


PyDict_Check


T


MutableMapping


json.dumps()



The actual code, that is
much more complex, is a part of my ruamel.yaml library which has to
work on Python 2.7 and Python 3.4+.


ruamel.yaml



As long as I can't solve this, I have to tell people that used to have
functioning JSON dumpers (without extra arguments) to use:


def json_default(obj):
if isinstance(obj, ruamel.yaml.comments.CommentedMap):
return obj._od
if isinstance(obj, ruamel.yaml.comments.CommentedSeq):
return obj._lst
raise TypeError

print(json.dumps(d, default=json_default))



, tell them to use a different loader than the default (round-trip) loader. E.g.:


yaml = YAML(typ='safe')
data = yaml.load(stream)



, implements some .to_json() method on the class T and make users
of ruamel.yaml aware of that


.to_json()


T


ruamel.yaml



, or go back to subclassing dict and have tell people to do


dict


dict([(k, v) for k, v in t.items()])



none of which is really friendly and would indicate it is impossible
to make a dict-like class that is non-trivial and cooperates well with the standard
library.






3 things I want to point out: 1) Your __iter__ or __next__ method is incorrect; trying to iterate over a T instance is an infinite loop. You cannot implement the __next__ method with yield. If you want to use yield, use it in __iter__. 2) All those occurences of dict.keys(self) would be better written as super().keys() because then you don't have to hard-code the parent class in 4 different places. 3) All those for x in y: yield x should be written as yield from y.

– Aran-Fey
Sep 13 '18 at 13:06



__iter__


__next__


T


__next__


yield


yield


__iter__


dict.keys(self)


super().keys()


for x in y: yield x


yield from y






@Aran-Fey Regarding 1) that is left over from some try-out. I removed that. for the other subclassing this is done without yield. Are you sure that 2 and 3 work on Python 2.7 as well?

– Anthon
Sep 13 '18 at 14:50


yield






Oops, I didn't realize you had to support python 2.7. (There's too much text to read!)

– Aran-Fey
Sep 13 '18 at 14:51






Sorry for that. I tried to be complete and still allow cut and pasting for those who want to try things. And that is kind of difficult in this case. Come 2020 I'll clean up my code ;-)

– Anthon
Sep 13 '18 at 14:56






As far as I understand, the issue boils down to how to provide additional keys during expansion of **data in your dump(). Did you try to examine CPython sources to see which hooks (magic functions) are considered during this expansion?

– Kirill Bulygin
Sep 14 '18 at 15:39


**data


dump()




3 Answers
3



Since the real problem here is really json.dumps's default encoder's inability to consider MutableMapping (or ruamel.yaml.comments.CommentedMap in your real-world example) as a dict, instead of telling people to set the default parameter of json.dumps to your json_default function like you mentioned, you can use functools.partial to make json_default a default value for the default parameter of json.dumps so that people don't have to do anything differently when they use your package:


json.dumps


MutableMapping


ruamel.yaml.comments.CommentedMap


default


json.dumps


json_default


functools.partial


json_default


default


json.dumps


from functools import partial
json.dumps = partial(json.dumps, default=json_default)



Or if you need to allow people to specify their own default parameter or even their own json.JSONEncoder subclass, you can use a wrapper around json.dumps so that it wraps the default function specified by the default parameter and the default method of the custom encoder specified by the cls parameter, whichever one is specified:


default


json.JSONEncoder


json.dumps


default


default


default


cls


import inspect

class override_json_default:
# keep track of the default methods that have already been wrapped
# so we don't wrap them again
_wrapped_defaults = set()

def __call__(self, func):
def override_default(default_func):
def default_wrapper(o):
o = default_func(o)
if isinstance(o, MutableMapping):
o = dict(o)
return o
return default_wrapper

def override_default_method(default_func):
def default_wrapper(self, o):
try:
return default_func(self, o)
except TypeError:
if isinstance(o, MutableMapping):
return dict(o)
raise
return default_wrapper

def wrapper(*args, **kwargs):
bound = sig.bind(*args, **kwargs)
bound.apply_defaults()
default = bound.arguments.get('default')
if default:
bound.arguments['default'] = override_default(default)
encoder = bound.arguments.get('cls')
if not default and not encoder:
bound.arguments['cls'] = encoder = json.JSONEncoder
if encoder:
default = getattr(encoder, 'default')
if default not in self._wrapped_defaults:
default = override_default_method(default)
self._wrapped_defaults.add(default)
setattr(encoder, 'default', default)
return func(*bound.args, **bound.kwargs)

sig = inspect.signature(func)
return wrapper

json.dumps=override_json_default()(json.dumps)



so that the following test code with both a custom default function and a custom encoder that handle datetime objects, as well as one without a custom default or encoder:


default


datetime


default


from datetime import datetime

def datetime_encoder(o):
if isinstance(o, datetime):
return o.isoformat()
return o

class DateTimeEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, datetime):
return o.isoformat()
return super(DateTimeEncoder, self).default(o)

def dump(data):
print(list(data.items()))
try:
print('cast:', dict(**data))
except Exception as e:
print('ERROR:', e)
try:
print('json with custom default:', json.dumps(data, default=datetime_encoder))
print('json wtih custom encoder:', json.dumps(data, cls=DateTimeEncoder))
del data['c']
print('json without datetime:', json.dumps(data))
except Exception as e:
print('ERROR:', e)

t = T(a=1, b=2, c=datetime.now())
dump(t)



would all give the proper output:


[('a', 1), ('b', 2), ('c', datetime.datetime(2018, 9, 15, 23, 59, 25, 575642)), ('default', 'DEFAULT')]
cast: 'a': 1, 'b': 2, 'c': datetime.datetime(2018, 9, 15, 23, 59, 25, 575642), 'default': 'DEFAULT'
json with custom default: "a": 1, "b": 2, "c": "2018-09-15T23:59:25.575642", "default": "DEFAULT"
json wtih custom encoder: "a": 1, "b": 2, "c": "2018-09-15T23:59:25.575642", "default": "DEFAULT"
json without datetime: "a": 1, "b": 2, "default": "DEFAULT"



As pointed out in the comments, the above code uses inspect.signature, which is not available until Python 3.3, and even then, inspect.BoundArguments.apply_defaults is not available until Python 3.5, and the funcsigs package, a backport of Python 3.3's inspect.signature, does not have the apply_defaults method either. To make the code as backward-compatible as possible, you can simply copy and paste the code of Python 3.5+'s inspect.BoundArguments.apply_defaults to your module and assign it as an attribute of inspect.BoundArguments after importing funcsigs as necessary:


inspect.signature


inspect.BoundArguments.apply_defaults


funcsigs


inspect.signature


apply_defaults


inspect.BoundArguments.apply_defaults


inspect.BoundArguments


funcsigs


from collections import OrderedDict

if not hasattr(inspect, 'signature'):
import funcsigs
for attr in funcsigs.__all__:
setattr(inspect, attr, getattr(funcsigs, attr))

if not hasattr(inspect.BoundArguments, 'apply_defaults'):
def apply_defaults(self):
arguments = self.arguments
new_arguments =
for name, param in self._signature.parameters.items():
try:
new_arguments.append((name, arguments[name]))
except KeyError:
if param.default is not funcsigs._empty:
val = param.default
elif param.kind is funcsigs._VAR_POSITIONAL:
val = ()
elif param.kind is funcsigs._VAR_KEYWORD:
val =
else:
continue
new_arguments.append((name, val))
self.arguments = OrderedDict(new_arguments)

inspect.BoundArguments.apply_defaults = apply_defaults






This is essentially the same as extending the default JSON Encoder. This might work for some, but it uses up the default parameter which is most likely provided by the people using json themselves (e.g. to handle datetime objects).

– Anthon
Sep 14 '18 at 8:13


default






I've updated my answer with a decorator that decorates default function and custom encoder so that people can use their own if desired. Please try it out and let me know if you still see any issues.

– blhsing
Sep 16 '18 at 7:05


default






The wrapping of json.dumps in the updated answer looks like an acceptable solution. I made two changes to get this to work properly: default_func needs to be called first (in case it can handle MutableMapping) and when it raises TypeError catch that and do if instance(o, MutableMapping): return dict(o) and else (re-)raise the TypeError. If the user doesn't provide either default no handler for MutableMapping is installed, so I changed wrapper to install override_default if neither is provided and have override_default handle a default_func is None by raising TypeError.

– Anthon
Sep 16 '18 at 22:49



default_func


MutableMapping


TypeError


if instance(o, MutableMapping): return dict(o)


TypeError


MutableMapping


wrapper


override_default


override_default


default_func is None






Shall I update your code with these fixes? With them this is what I consider an acceptable answer, and I don't want to post my updated version as an answer, where you have done most of the hard work. I can also get you a link to the modified code if you want.

– Anthon
Sep 16 '18 at 22:51






I see. I've removed the mention of the issue with MutableMappings in 2.7 from my answer then. I'll leave my apply_defaults code in the answer though, just in case someone is interested in using the apply_defaults method in older code anyway.

– blhsing
Sep 19 '18 at 6:28


MutableMappings


apply_defaults


apply_defaults



The answers to Q1 and Q2 are: "You cannot" resp. "No"



In short: you cannot add a key on-the-fly within Python and have JSON output
as well (without patching json.dumps or providing a default to it).


default



The reason for that is that for JSON to work at all, you need to make
your class a subclass of dict (or some other object implemented at
the C level) so that its call of PyDict_Check() returns non-zero
(which means the tp_flags field in the objectheader has the
Py_TPFLAGS_DICT_SUBCLASS bit set).


dict


PyDict_Check()



The cast (dict(**data))) first does this check on the C level as
well (in dictobject.c:dict_merge). But there is a difference in how
things proceed from there. When dumping JSON the code actually
iterates over the key/values using routines provided by the subclass
if these are available.


dict(**data))


dictobject.c:dict_merge



On the contrary the cast doesn't look if there is any subclassing
going on and copies the values from the C level implementation (
dict, ruamel.ordereddict, etc.).


dict


ruamel.ordereddict



When casting something that is not a subclass of dict, then the
normal Python class level interface (__iter__) is called to get the
key/value pairs. This is why subclassing MutableMapping makes casting
works, but unfortunately it breaks JSON dumping.


dict


__iter__



It will not suffice to create a stripped down C level class that returns non-zero on
PyDict_Check(), as the casting will iterate on the C level over that class' keys and values.


PyDict_Check()



The only way to implement this transparently, is by implementing a C level dict like class, that does the
on-the-fly insertion of the key default and its value. It has to do so by faking a
length that is one bigger than the actual number of entries and
somehow implement indexing at the C level of ma_keys and ma_values to have that
extra item. If possible at all, that is going to be hard, as dict_merge assumes
fixed knowledge about quite a bit of the internals of the source object.


default


ma_keys


ma_values


dict_merge



An alternative for fixing json.dumps is to fix dict_merge, but the latter would affect
a lot of code negatively in speed, so that is less likely to happen (and also would not
be done retroactively on older versions of Python either).


json.dumps


dict_merge



You can approach the problem in a completely different way. Instead of trying to produce a value when the key 'default' is requested on the fly, you can initialize the dict with the key 'default' set to your desired value, and then protect the value of the 'default' key by overriding all the methods that can potentially alter the content of the dict so that the value of the key 'default' is never altered:


'default'


'default'


'default'


'default'


class T(dict):
def __init__(self, **kwargs):
kwargs['default'] = 'DEFAULT'
super(T, self).__init__(**kwargs)

def __setitem__(self, key, value):
if key != 'default':
super(T, self).__setitem__(key, value)

def __delitem__(self, key):
if key != 'default':
super(T, self).__delitem__(key)

def clear(self):
super(T, self).clear()
self.__init__()

def pop(self, key, **kwargs):
if key == 'default':
return self[key]
return super(T, self).pop(key, **kwargs)

def popitem(self):
key, value = super(T, self).popitem()
if key == 'default':
key2, value2 = super(T, self).popitem()
super(T, self).__setitem__(key, value)
return key2, value2
return key, value

def update(self, other, **kwargs):
if kwargs:
if 'default' in kwargs:
del kwargs['default']
elif 'default' in other:
del other['default']
super(T, self).update(other, **kwargs)






This is not adding the key "on the fly" as I indicated is necessary. That might do for this simplification of the problem, but not for the real thing.

– Anthon
Sep 13 '18 at 14:44






Can you update your tester so that I can get a better understanding of what your expected "on the fly" behavior is?

– blhsing
Sep 13 '18 at 14:47







I am not sure what you mean with my tester. On-the-fly means that at the moment something starts iterating over the keys, or start copying the items etc, that extra values from somewhere else are added. You propose to add them to the dict instance itself beforehand, that is unacceptable (because in the real situation I need the dict without those additions as well).

– Anthon
Sep 13 '18 at 14:53






What I mean is that you could write a test case with a scenario that shows how a non-on-the-fly solution wouldn't work, one that only an on-the-fly solution would work. Like you now mentioned, that you need the dict without those additions as well, so you should have a test case that shows how you use the dict without the additions. At the moment I still don't see under what condition you're going use the dict without the additions since you want to be able to cast T to dict with the additions too. You can't expect others to know what your "real situation" is without an example.

– blhsing
Sep 13 '18 at 14:58



T






Thanks for your effort, but this is not going to work at all. The line kwargs['default'] = 'DEFAULT' cannot be in the __init__ function it assumes you can up-front get all the values. I cannot afford to have a copy of the "other" keys in this dict, there can be thousands, this has to be done on the fly. The keys that have to be put in on the fly are not fixed, they get determined when the instance items get iterated over. And of course the values for those keys are not guaranteed to be available before the start of the iteration either (otherwise it would not be on-the-fly).

– Anthon
Sep 13 '18 at 17:33


kwargs['default'] = 'DEFAULT'


__init__



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)