Python: Difficulties in Runtime Type-checking
Static typing is relatively unpopular in the Python ecosystem, despite the standard library including a module since 3.5, which released September 13, 2015 (more than 8 years ago!)1
Adoption is slow in large part because using Python types with non-trivial code can be extremely hard. Consider the problem of ingesting external data, such as
- SQL query results or
- JSON/YAML from APIs or files.
The typical solution is to type the data as typing.Any
which
- spreads like a virus through all downstream uses of the data and
- lacks any IDE support like autocomplete.
data: typing.Any = res.json() # {"user_id": "df25a386-fc0d-4d5d-9d9d-47df52b3f19f"}
uid = data["user_id"] # note: this will not autocomplete
# type of `uid` is `typing.Any`
next_user = uid + 1
# Python exception:
# ‼️ TypeError: can only concatenate str (not "int") to str
A better alternative is to define the expected data type, then use typing.cast
:
class UserPayload(TypedDict):
user_id: str
user_data = typing.cast(UserPayload, res.json())
uid = user_data["user_id"] # note: `user_id` autocompletes!
# type of `uid` is `str`
next_user = uid + 1
# IDE warning from pyright:
# ⚠️ Operator "+" not supported for types "str" and "Literal[1]"
This is still merely a hack since casting simply silences the type error without doing anything to ensure the data has the expected type.
x: str = typing.cast(str, 123) # obviously broken but no error
x.lower()
# ‼️ AttributeError: 'int' object has no attribute 'lower'
The best possible solution is to somehow do a type-check at runtime.
The requirements are:
- use the standard
typing
type annotations, - verify that data is compatible with the expected type,
- not change the data itself.
This eliminates existing libraries like Pydantic and marshmallow which require using custom definitions for data models/schemas. Both these libraries also focus on more general issues of data validation (e.g. conditions like x > 10
), serialization and parsing, etc. We only care about types.
Possible solutions:
- Typeguard is probably the best known one
- Rewrites the instrumented function body to add type checks
- It is relatively inactive and looking for a new maintainer
- Beartype is probably the most complete one
- Wraps functions in a proxy function that does the type checks
- Does not expose a
check_type
function. Only a decorator - Sidenote: the documentation is very strange and the code is unusual
Evaluating Our Options
Let's test these runtime type checkers. Consider the example of an API that returns "any valid JSON" as metadata in its response:
// curl https://example.com/user/12345
{
"id": 12345,
"name": "John Doe",
// this is arbitrary JSON
"metadata": {
"number": 111,
"note": "hello world",
"todo": ["buy milk", "write book"]
// ...
}
}
We try define the Python type for user["metadata"]
and get NameError
s:
JsonValue = (
# supported primitive values
int | float | str | bool | None |
List[JsonValue] | # ‼️ NameError: name 'JsonValue' is not defined
Dict[str, JsonValue] # ‼️ NameError: name 'JsonValue' is not defined
)
To fix them we use string annotations:2
JsonValue = (
# supported primitive values
int | float | str | bool | None |
List["JsonValue"] |
Dict[str, "JsonValue"]
)
Now, we can try to use JsonValue
with our typecheckers.
When we test in a context where JsonValue
is not available by its original name, Typeguard breaks:
from json_type import JsonValue as JsonVal
# Note that "JsonValue" is not in scope since we renamed it:
# >>> print(JsonValue)
# NameError: name 'JsonValue' is not defined
x = typeguard.check_type([set()], JsonVal)
# TypeHintWarning: Cannot resolve forward reference 'JsonValue'
# The type check does not prevent a downstream error:
print(json.dumps(x))
# ‼️‼️‼️
# TypeError: Object of type set is not JSON serializable
# ‼️‼️‼️
Beartype breaks in a similar way.
So do the standard library utilities.
Here is a rough outline of what happens:
When we used string annotations, we introduced a new type,
typing.ForwardRef
:>>> JsonValue typing.Union[ int, float, str, bool, NoneType, typing.List[ForwardRef('JsonValue')], typing.Dict[str, ForwardRef('JsonValue')] ]
The type checker starts to type check our value
[set()]
by trying to match it withJsonValue
i.e.typing.Union[int, float, ...]
.Since
[set()]
is a list, it matches thetyping.List[...]
type, and the type checker recurses on the elements.The first element is
set()
and we must match it withForwardRef('JsonValue')
which is the type of the elements specified in ourtyping.List
.The
ForwardRef
says to look for a type namedJsonValue
, so we look forJsonValue
in the current scope and don't find it. Remember that we imported it renamed asJsonVal
.At this point Beartype and the standard library throw an error and Typeguard silently ignores the type and prints a warning.
To fix this, we need to find a way to reliably resolve typing.ForwardRef
to its target, regardless of what is available in scope. This is our ultimate goal.
Investigating
Examining the ForwardRef
value:
>>> list_type = typing.get_args(json_type.JsonValue)[-2]; list_type
typing.List[ForwardRef('JsonValue')]
>>> fr = typing.get_args(list_type)[0]; fr
ForwardRef('JsonValue')
>>> pprint({k: getattr(fr, k) for k in dir(fr)})
{'__forward_arg__': 'JsonValue',
'__forward_code__': <code object <module> at 0x10a1ebab0, file "<string>", line 1>,
'__forward_evaluated__': False,
'__forward_is_argument__': True,
'__forward_is_class__': False,
'__forward_module__': None,
'__forward_value__': None,
'_evaluate': <bound method ForwardRef._evaluate of ForwardRef('JsonValue')>,
# unimportant attributes omitted
...}
Of particular interest is the _evaluate
method as its name suggests that it might be related. Unfortunately, it is not as simple as calling it:
>>> help(fr._evaluate)
Help on method _evaluate in module typing:
_evaluate(globalns, localns, recursive_guard) method of typing.ForwardRef instance
>>> fr._evaluate(globals(), locals(), set())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../lib/python3.12/typing.py", line 900, in _evaluate
eval(self.__forward_code__, globalns, localns),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 1, in <module>
NameError: name 'JsonValue' is not defined
We need to somehow come up with the globals()
and locals()
that have the referred-to type available. Ideally, these are the same namespaces as at the point of definition of the type.
Monkey-patching ForwardRef
Since we get instances of the ForwardRef
class, its __init__
method must be called at some point:
og = ForwardRef.__init__
def tracer(self, *args, **kwargs):
print(args, kwargs)
return og(self, *args, **kwargs)
ForwardRef.__init__ = tracer
JsonValue: TypeAlias = (
int | float | str | bool | None | List["JsonValue"] | Dict[str, "JsonValue"]
)
# ('JsonValue',) {'module': None, 'is_class': False}
# ('JsonValue',) {'module': None, 'is_class': False}
Now, as all the best Python hacks go, we break out sys._getframe()
:3
def tracer(self, *args, **kwargs):
global frame
frame = sys._getframe().f_back # get caller frame
return og(self, *args, **kwargs)
>>> frame.f_code.co_filename
'.../lib/python3.12/typing.py'
>>> frame.f_back.f_back.f_back.f_back.f_back.f_code.co_filename
'.../python_resolving_forwardrefs/json_type.py'
>>> top_fr = frame.f_back.f_back.f_back.f_back.f_back
>>> pprint(top_fr.f_globals)
{'JsonValue': typing.Union[int, float, str, bool, NoneType, typing.List[ForwardRef('JsonValue')], typing.Dict[str, ForwardRef('JsonValue')]],
...}
>>> fr._evaluate(top_fr.f_globals, top_fr.f_locals, set())
typing.Union[int, float, str, bool, NoneType, typing.List[ForwardRef('JsonValue')], typing.Dict[str, ForwardRef('JsonValue')]]
Getting from here to a "production-ready" hack is relatively straightforward:
# to avoid polluting the class more than necessary,
# store all new state out-of-band
forward_frames: dict[int, FrameType] = {}
real_init = ForwardRef.__init__
def init(self, *args, **kwargs):
cur = sys._getframe().f_back
assert cur is not None
typing_filename = cur.f_code.co_filename
while cur is not None and cur.f_code.co_filename == typing_filename:
cur = cur.f_back
if cur is not None:
forward_frames[id(self)] = cur
real_init(self, *args, **kwargs)
ForwardRef.__init__ = init
Usage examples and a toy type-checker using the above code are available in the code repository.
Epilog
Python 3.12 avoided this issue in the new type JsonValue = ...
(PEP 695) aliases. TypeAlias
is being deprecated. The new syntax does not create typing.ForwardRef
at all, instead the fully resolved target is stored in typing.TypeAliasType.__value__
.
This is great!
Unfortunately, neither beartype
nor typeguard
support it yet.Mypy
also lacks support.Pyright
commendably introduced support in May 2023.
Rant (Deprecated in Python 3.12)
I'm leaving the following rant as-is for the following reasons:
- Most projects are not going to be able to switch to Python 3.12 for a long while (until all their dependencies update)
- The overall message regarding future typing PEPs still stands.
- It lists some reasons why PEP 695 is so good.
- There are similar issues in other parts of the standard
typing
module. - Other typing PEPs need to get the message. Example: PEP 563 i.e.
from __future__ import annotations
would basically turn all annotations into strings that are impossible to evaluate at runtime.
- We will likely have to deal with leftover
ForwardRef
s for a few years. - Most tools do not support the new syntax yet.
- Most people are not going to see the deprecation warning in the
TypeAlias
docs.- There should probably be a deprecation message on
ForwardRef
and a runtime warning.
- There should probably be a deprecation message on
- I've learned about PEP 695 only after finishing the post.
It simply should not be this hard to deal with Python's typing
.
A simple task—reading external data—carries a gotcha in the standard library. Everything works as expected unless a very specific set of circumstances arises. Maybe these circumstances are rare, but they are entirely plausible. The motivation for this hack came from production code not an academic exercise.
The specific conditions that trigger the problem are hard to explain and inconsistent between type checker implementations:
- Static type checkers (e.g. Mypy, Pyright) "just work"
typeguard.check_type
requires allForwardRef
s to be resolvable at call sitebeartype
andtypeguard.typechecked
decorators require theForwardRef
s to be resolvable at function definition site- New Python features change the story entirely, making it worse
Is it any wonder that adoption has been slow? When even modern projects dedicated to type-checking get it wrong, the temptation is to just typing.cast
away all problems.
To turn this around, future PEPs should:
- strive for parity between runtime and static type-checking,
- aim for consistency above all,
- not accept arbitrary limitations on standard library types and functions.,
- aim to provide standard ways to interact with
typing
, ideally a reference type-checker implementation that is usable at runtime.
Appendix
from __future__ import annotations
(PEP 563)
Deferred annotation evaluation does nothing for us because the alias is defined in regular Python code and not an annotation:
from __future__ import annotations
class List1:
child: List1 | None # PEP 563 makes this reference OK
List2 = List2 | None # regular Python variable, NOT an annotation
# ‼️ NameError: name 'List2' is not defined.
Python 3.9 Generic Standard Collections
Using the standard collections as generic types (PEP 585) does not produce a ForwardRef
. It leaves the value as a str
:
>>> typing.get_args(list["Test"])
('Test',)
>>> type(typing.get_args(list["Test"])[0])
<class 'str'>
A quote from the standard library docs supports this:
Note: PEP 585 generic types such as
list["SomeClass"]
will not be implicitly transformed intolist[ForwardRef("SomeClass")]
and thus will not automatically resolve tolist[SomeClass]
.
Built-in classes like list
and str
cannot be patched.4
This means that this hack is deprecated since 3.9 with no alternative. It might be possible to devise an even bigger hack, but for now I suggest simply using typing.List
etc. when required.
Breaking Beartype
- Beartype with sometimes throw
BeartypeCallHintForwardRefException: Forward reference 'fn.JsonValue' referent ... not class.
when usingJsonValue
so we have to use a different type. - Need to use a function because we only get a decorator.
# test_type.py
class TestType: ...
ArrayOfTest: TypeAlias = List["TestType"]
# fn.py
from test_type import ArrayOfTest, TestType as TT
@beartype
def test(x: ArrayOfTest): ...
test([TT()])
# ‼️‼️‼️
# beartype.roar.BeartypeCallHintForwardRefException:
# Forward reference "fn.TestType" unimportable
# ‼️‼️‼️
typing.get_type_hints
The standard library documentation explicitly disclaims support for our usage of ForwardRef
in the utility method get_type_hints
:
Note:
get_type_hints()
does not work with imported type aliases that include forward references. Enabling postponed evaluation of annotations (PEP 563) may remove the need for most forward references.
Python 3.10 Union Inconsistency
The new way of writing unions introduced in PEP604 does not match typing.Union
and gives inconsistent results depending on the type arguments:
>>> int | list[str]
int | list[str]
>>> type(int | list[str])
<class 'types.UnionType'>
# not the same as `typing.Union`?
>>> int | typing.List[str]
typing.Union[int, typing.List[str]]
# why is this `typing.Union` now?
>>> # typing.Union is consistent:
>>> typing.Union[int, list[str]]
typing.Union[int, list[str]]
>>> typing.Union[int, typing.List[str]]
typing.Union[int, typing.List[str]]
Reproducibility
Code repository available on GitHub
All experiments were done with
Python 3.12.0 | packaged by Anaconda, Inc. | (main, Oct 2 2023, 12:29:27) [Clang 14.0.6 ] on darwin
beartype==0.16.2
typeguard==4.1.5
Footnotes
This has been deprecated in Python 3.12, the new syntax is
type JsonValue = ...
(PEP 695). More on this in the epilog.↩And break compatibility with some Python runtimes!
sys._getframe
is only predictable in the default CPython implementation.↩Not strictly true, you can abuse C FFI to modify the method tables: https://pypi.org/project/forbiddenfruit/↩