Extended dict-like subclass to support casting and JSON dumping without extras
Extended dict-like subclass to support casting and JSON dumping without extras
I need to create an instance t
of a dict-like class T
that supports
both "casting" to a real dict with dict(**t)
, without reverting to doing dict([(k, v) for k, v in t.items()])
. As well as supports dumping as
JSON using the standard json
library, without extending the normal JSON
Encoder (i.e. no function provided for the default
parameter).
t
T
dict(**t)
dict([(k, v) for k, v in t.items()])
json
default
With t
being a normal dict
, both work:
t
dict
import json
def dump(data):
print(list(data.items()))
try:
print('cast:', dict(**data))
except Exception as e:
print('ERROR:', e)
try:
print('json:', json.dumps(data))
except Exception as e:
print('ERROR:', e)
t = dict(a=1, b=2)
dump(t)
printing:
[('a', 1), ('b', 2)]
cast: 'a': 1, 'b': 2
json: "a": 1, "b": 2
However I want t
to be an instance of the class T
that adds e.g. a
key default
"on the fly" to its items, so no inserting up-front is possible (actually I want merged keys
from one or more instances of T to show up, this is a simplification of that real,
much more complex, class).
t
T
default
class T(dict):
def __getitem__(self, key):
if key == 'default':
return 'DEFAULT'
return dict.__getitem__(self, key)
def items(self):
for k in dict.keys(self):
yield k, self[k]
yield 'default', self['default']
def keys(self):
for k in dict.keys(self):
yield k
yield 'default'
t = T(a=1, b=2)
dump(t)
this gives:
[('a', 1), ('b', 2), ('default', 'DEFAULT')]
cast: 'a': 1, 'b': 2
json: "a": 1, "b": 2, "default": "DEFAULT"
and the cast doesn't work properly because there is no key 'default',
and I don't know which "magic" function to provide to make casting
work.
When I build T
upon the functionality that collections.abc
implements, and provide the
required abstract methods in the subclass, casting works:
T
collections.abc
from collections.abc import MutableMapping
class TIter:
def __init__(self, t):
self.keys = list(t.d.keys()) + ['default']
self.index = 0
def __next__(self):
if self.index == len(self.keys):
raise StopIteration
res = self.keys[self.index]
self.index += 1
return res
class T(MutableMapping):
def __init__(self, **kw):
self.d = dict(**kw)
def __delitem__(self, key):
if key != 'default':
del self.d[key]
def __len__(self):
return len(self.d) + 1
def __setitem__(self, key, v):
if key != 'default':
self.d[key] = v
def __getitem__(self, key):
if key == 'default':
return 'DEFAULT'
# return None
return self.d[key]
def __iter__(self):
return TIter(self)
t = T(a=1, b=2)
dump(t)
which gives:
[('a', 1), ('b', 2), ('default', 'DEFAULT')]
cast: 'a': 1, 'b': 2, 'default': 'DEFAULT'
ERROR: Object of type 'T' is not JSON serializable
The JSON dumping fails because that dumper cannot handleMutableMapping
subclasses, it explicitly tests on the C level using PyDict_Check
.
MutableMapping
PyDict_Check
When I tried to make T
a subclass of both dict
andMutableMapping
, I did get the same result as when using only
the dict
subclass.
T
dict
MutableMapping
dict
I can of course consider it a bug that the json
dumper has not
been updated to assume that (concrete subclasses of)collections.abc.Mapping
are dumpable. But even if it is acknowledged
as a bug and gets fixed in some future version of Python, I don't think
such a fix will be applied to older versions of Python.
json
collections.abc.Mapping
Q1: How can I make the T
implementation that is a subclass ofdict
, to cast properly?
Q2: If Q1 doesn't have an answer, would it
work if I make a C level class that returns the right value forPyDict_Check
but doesn't do any of the actual implementation (and
then make T
a subclass of that as well as MutableMapping
(I don't
think adding such an incomplete C level dict will work, but I haven't
tried), and would this fool json.dumps()
?
Q3 Is this a complete wrong approach to get both to work like the first example?
T
dict
PyDict_Check
T
MutableMapping
json.dumps()
The actual code, that is
much more complex, is a part of my ruamel.yaml
library which has to
work on Python 2.7 and Python 3.4+.
ruamel.yaml
As long as I can't solve this, I have to tell people that used to have
functioning JSON dumpers (without extra arguments) to use:
def json_default(obj):
if isinstance(obj, ruamel.yaml.comments.CommentedMap):
return obj._od
if isinstance(obj, ruamel.yaml.comments.CommentedSeq):
return obj._lst
raise TypeError
print(json.dumps(d, default=json_default))
, tell them to use a different loader than the default (round-trip) loader. E.g.:
yaml = YAML(typ='safe')
data = yaml.load(stream)
, implements some .to_json()
method on the class T
and make users
of ruamel.yaml
aware of that
.to_json()
T
ruamel.yaml
, or go back to subclassing dict
and have tell people to do
dict
dict([(k, v) for k, v in t.items()])
none of which is really friendly and would indicate it is impossible
to make a dict-like class that is non-trivial and cooperates well with the standard
library.
__iter__
__next__
T
__next__
yield
yield
__iter__
dict.keys(self)
super().keys()
for x in y: yield x
yield from y
@Aran-Fey Regarding 1) that is left over from some try-out. I removed that. for the other subclassing this is done without
yield
. Are you sure that 2 and 3 work on Python 2.7 as well?– Anthon
Sep 13 '18 at 14:50
yield
Oops, I didn't realize you had to support python 2.7. (There's too much text to read!)
– Aran-Fey
Sep 13 '18 at 14:51
Sorry for that. I tried to be complete and still allow cut and pasting for those who want to try things. And that is kind of difficult in this case. Come 2020 I'll clean up my code ;-)
– Anthon
Sep 13 '18 at 14:56
As far as I understand, the issue boils down to how to provide additional keys during expansion of
**data
in your dump()
. Did you try to examine CPython sources to see which hooks (magic functions) are considered during this expansion?– Kirill Bulygin
Sep 14 '18 at 15:39
**data
dump()
3 Answers
3
Since the real problem here is really json.dumps
's default encoder's inability to consider MutableMapping
(or ruamel.yaml.comments.CommentedMap
in your real-world example) as a dict, instead of telling people to set the default
parameter of json.dumps
to your json_default
function like you mentioned, you can use functools.partial
to make json_default
a default value for the default
parameter of json.dumps
so that people don't have to do anything differently when they use your package:
json.dumps
MutableMapping
ruamel.yaml.comments.CommentedMap
default
json.dumps
json_default
functools.partial
json_default
default
json.dumps
from functools import partial
json.dumps = partial(json.dumps, default=json_default)
Or if you need to allow people to specify their own default
parameter or even their own json.JSONEncoder
subclass, you can use a wrapper around json.dumps
so that it wraps the default
function specified by the default
parameter and the default
method of the custom encoder specified by the cls
parameter, whichever one is specified:
default
json.JSONEncoder
json.dumps
default
default
default
cls
import inspect
class override_json_default:
# keep track of the default methods that have already been wrapped
# so we don't wrap them again
_wrapped_defaults = set()
def __call__(self, func):
def override_default(default_func):
def default_wrapper(o):
o = default_func(o)
if isinstance(o, MutableMapping):
o = dict(o)
return o
return default_wrapper
def override_default_method(default_func):
def default_wrapper(self, o):
try:
return default_func(self, o)
except TypeError:
if isinstance(o, MutableMapping):
return dict(o)
raise
return default_wrapper
def wrapper(*args, **kwargs):
bound = sig.bind(*args, **kwargs)
bound.apply_defaults()
default = bound.arguments.get('default')
if default:
bound.arguments['default'] = override_default(default)
encoder = bound.arguments.get('cls')
if not default and not encoder:
bound.arguments['cls'] = encoder = json.JSONEncoder
if encoder:
default = getattr(encoder, 'default')
if default not in self._wrapped_defaults:
default = override_default_method(default)
self._wrapped_defaults.add(default)
setattr(encoder, 'default', default)
return func(*bound.args, **bound.kwargs)
sig = inspect.signature(func)
return wrapper
json.dumps=override_json_default()(json.dumps)
so that the following test code with both a custom default
function and a custom encoder that handle datetime
objects, as well as one without a custom default
or encoder:
default
datetime
default
from datetime import datetime
def datetime_encoder(o):
if isinstance(o, datetime):
return o.isoformat()
return o
class DateTimeEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, datetime):
return o.isoformat()
return super(DateTimeEncoder, self).default(o)
def dump(data):
print(list(data.items()))
try:
print('cast:', dict(**data))
except Exception as e:
print('ERROR:', e)
try:
print('json with custom default:', json.dumps(data, default=datetime_encoder))
print('json wtih custom encoder:', json.dumps(data, cls=DateTimeEncoder))
del data['c']
print('json without datetime:', json.dumps(data))
except Exception as e:
print('ERROR:', e)
t = T(a=1, b=2, c=datetime.now())
dump(t)
would all give the proper output:
[('a', 1), ('b', 2), ('c', datetime.datetime(2018, 9, 15, 23, 59, 25, 575642)), ('default', 'DEFAULT')]
cast: 'a': 1, 'b': 2, 'c': datetime.datetime(2018, 9, 15, 23, 59, 25, 575642), 'default': 'DEFAULT'
json with custom default: "a": 1, "b": 2, "c": "2018-09-15T23:59:25.575642", "default": "DEFAULT"
json wtih custom encoder: "a": 1, "b": 2, "c": "2018-09-15T23:59:25.575642", "default": "DEFAULT"
json without datetime: "a": 1, "b": 2, "default": "DEFAULT"
As pointed out in the comments, the above code uses inspect.signature
, which is not available until Python 3.3, and even then, inspect.BoundArguments.apply_defaults
is not available until Python 3.5, and the funcsigs
package, a backport of Python 3.3's inspect.signature
, does not have the apply_defaults
method either. To make the code as backward-compatible as possible, you can simply copy and paste the code of Python 3.5+'s inspect.BoundArguments.apply_defaults
to your module and assign it as an attribute of inspect.BoundArguments
after importing funcsigs
as necessary:
inspect.signature
inspect.BoundArguments.apply_defaults
funcsigs
inspect.signature
apply_defaults
inspect.BoundArguments.apply_defaults
inspect.BoundArguments
funcsigs
from collections import OrderedDict
if not hasattr(inspect, 'signature'):
import funcsigs
for attr in funcsigs.__all__:
setattr(inspect, attr, getattr(funcsigs, attr))
if not hasattr(inspect.BoundArguments, 'apply_defaults'):
def apply_defaults(self):
arguments = self.arguments
new_arguments =
for name, param in self._signature.parameters.items():
try:
new_arguments.append((name, arguments[name]))
except KeyError:
if param.default is not funcsigs._empty:
val = param.default
elif param.kind is funcsigs._VAR_POSITIONAL:
val = ()
elif param.kind is funcsigs._VAR_KEYWORD:
val =
else:
continue
new_arguments.append((name, val))
self.arguments = OrderedDict(new_arguments)
inspect.BoundArguments.apply_defaults = apply_defaults
This is essentially the same as extending the default JSON Encoder. This might work for some, but it uses up the
default
parameter which is most likely provided by the people using json themselves (e.g. to handle datetime objects).– Anthon
Sep 14 '18 at 8:13
default
I've updated my answer with a decorator that decorates
default
function and custom encoder so that people can use their own if desired. Please try it out and let me know if you still see any issues.– blhsing
Sep 16 '18 at 7:05
default
The wrapping of json.dumps in the updated answer looks like an acceptable solution. I made two changes to get this to work properly:
default_func
needs to be called first (in case it can handle MutableMapping
) and when it raises TypeError
catch that and do if instance(o, MutableMapping): return dict(o)
and else (re-)raise the TypeError
. If the user doesn't provide either default no handler for MutableMapping
is installed, so I changed wrapper
to install override_default
if neither is provided and have override_default
handle a default_func is None
by raising TypeError.– Anthon
Sep 16 '18 at 22:49
default_func
MutableMapping
TypeError
if instance(o, MutableMapping): return dict(o)
TypeError
MutableMapping
wrapper
override_default
override_default
default_func is None
Shall I update your code with these fixes? With them this is what I consider an acceptable answer, and I don't want to post my updated version as an answer, where you have done most of the hard work. I can also get you a link to the modified code if you want.
– Anthon
Sep 16 '18 at 22:51
I see. I've removed the mention of the issue with
MutableMappings
in 2.7 from my answer then. I'll leave my apply_defaults
code in the answer though, just in case someone is interested in using the apply_defaults
method in older code anyway.– blhsing
Sep 19 '18 at 6:28
MutableMappings
apply_defaults
apply_defaults
The answers to Q1 and Q2 are: "You cannot" resp. "No"
In short: you cannot add a key on-the-fly within Python and have JSON output
as well (without patching json.dumps or providing a default
to it).
default
The reason for that is that for JSON to work at all, you need to make
your class a subclass of dict
(or some other object implemented at
the C level) so that its call of PyDict_Check()
returns non-zero
(which means the tp_flags field in the objectheader has the
Py_TPFLAGS_DICT_SUBCLASS bit set).
dict
PyDict_Check()
The cast (dict(**data))
) first does this check on the C level as
well (in dictobject.c:dict_merge
). But there is a difference in how
things proceed from there. When dumping JSON the code actually
iterates over the key/values using routines provided by the subclass
if these are available.
dict(**data))
dictobject.c:dict_merge
On the contrary the cast doesn't look if there is any subclassing
going on and copies the values from the C level implementation (dict
, ruamel.ordereddict
, etc.).
dict
ruamel.ordereddict
When casting something that is not a subclass of dict
, then the
normal Python class level interface (__iter__
) is called to get the
key/value pairs. This is why subclassing MutableMapping makes casting
works, but unfortunately it breaks JSON dumping.
dict
__iter__
It will not suffice to create a stripped down C level class that returns non-zero onPyDict_Check()
, as the casting will iterate on the C level over that class' keys and values.
PyDict_Check()
The only way to implement this transparently, is by implementing a C level dict like class, that does the
on-the-fly insertion of the key default
and its value. It has to do so by faking a
length that is one bigger than the actual number of entries and
somehow implement indexing at the C level of ma_keys
and ma_values
to have that
extra item. If possible at all, that is going to be hard, as dict_merge
assumes
fixed knowledge about quite a bit of the internals of the source object.
default
ma_keys
ma_values
dict_merge
An alternative for fixing json.dumps
is to fix dict_merge
, but the latter would affect
a lot of code negatively in speed, so that is less likely to happen (and also would not
be done retroactively on older versions of Python either).
json.dumps
dict_merge
You can approach the problem in a completely different way. Instead of trying to produce a value when the key 'default'
is requested on the fly, you can initialize the dict with the key 'default'
set to your desired value, and then protect the value of the 'default'
key by overriding all the methods that can potentially alter the content of the dict so that the value of the key 'default'
is never altered:
'default'
'default'
'default'
'default'
class T(dict):
def __init__(self, **kwargs):
kwargs['default'] = 'DEFAULT'
super(T, self).__init__(**kwargs)
def __setitem__(self, key, value):
if key != 'default':
super(T, self).__setitem__(key, value)
def __delitem__(self, key):
if key != 'default':
super(T, self).__delitem__(key)
def clear(self):
super(T, self).clear()
self.__init__()
def pop(self, key, **kwargs):
if key == 'default':
return self[key]
return super(T, self).pop(key, **kwargs)
def popitem(self):
key, value = super(T, self).popitem()
if key == 'default':
key2, value2 = super(T, self).popitem()
super(T, self).__setitem__(key, value)
return key2, value2
return key, value
def update(self, other, **kwargs):
if kwargs:
if 'default' in kwargs:
del kwargs['default']
elif 'default' in other:
del other['default']
super(T, self).update(other, **kwargs)
This is not adding the key "on the fly" as I indicated is necessary. That might do for this simplification of the problem, but not for the real thing.
– Anthon
Sep 13 '18 at 14:44
Can you update your tester so that I can get a better understanding of what your expected "on the fly" behavior is?
– blhsing
Sep 13 '18 at 14:47
I am not sure what you mean with my tester. On-the-fly means that at the moment something starts iterating over the keys, or start copying the items etc, that extra values from somewhere else are added. You propose to add them to the dict instance itself beforehand, that is unacceptable (because in the real situation I need the dict without those additions as well).
– Anthon
Sep 13 '18 at 14:53
What I mean is that you could write a test case with a scenario that shows how a non-on-the-fly solution wouldn't work, one that only an on-the-fly solution would work. Like you now mentioned, that you need the dict without those additions as well, so you should have a test case that shows how you use the dict without the additions. At the moment I still don't see under what condition you're going use the dict without the additions since you want to be able to cast
T
to dict with the additions too. You can't expect others to know what your "real situation" is without an example.– blhsing
Sep 13 '18 at 14:58
T
Thanks for your effort, but this is not going to work at all. The line
kwargs['default'] = 'DEFAULT'
cannot be in the __init__
function it assumes you can up-front get all the values. I cannot afford to have a copy of the "other" keys in this dict, there can be thousands, this has to be done on the fly. The keys that have to be put in on the fly are not fixed, they get determined when the instance items get iterated over. And of course the values for those keys are not guaranteed to be available before the start of the iteration either (otherwise it would not be on-the-fly).– Anthon
Sep 13 '18 at 17:33
kwargs['default'] = 'DEFAULT'
__init__
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
3 things I want to point out: 1) Your
__iter__
or__next__
method is incorrect; trying to iterate over aT
instance is an infinite loop. You cannot implement the__next__
method withyield
. If you want to useyield
, use it in__iter__
. 2) All those occurences ofdict.keys(self)
would be better written assuper().keys()
because then you don't have to hard-code the parent class in 4 different places. 3) All thosefor x in y: yield x
should be written asyield from y
.– Aran-Fey
Sep 13 '18 at 13:06