Hephaestian Sea

Python: Difficulties in Runtime Type-checking

Static typing is relatively unpopular in the Python ecosystem, despite the standard library including a module since 3.5, which released September 13, 2015 (more than 8 years ago!)1

Adoption is slow in large part because using Python types with non-trivial code can be extremely hard. Consider the problem of ingesting external data, such as


The typical solution is to type the data as typing.Any which

  1. spreads like a virus through all downstream uses of the data and
  2. lacks any IDE support like autocomplete.
data: typing.Any = res.json() # {"user_id": "df25a386-fc0d-4d5d-9d9d-47df52b3f19f"}
uid = data["user_id"] # note: this will not autocomplete

# type of `uid` is `typing.Any`

next_user = uid + 1
# Python exception:
# ‼️ TypeError: can only concatenate str (not "int") to str

A better alternative is to define the expected data type, then use typing.cast:

class UserPayload(TypedDict):
    user_id: str

user_data = typing.cast(UserPayload, res.json())
uid = user_data["user_id"] # note: `user_id` autocompletes!

# type of `uid` is `str`

next_user = uid + 1
# IDE warning from pyright:
# ⚠️ Operator "+" not supported for types "str" and "Literal[1]"

This is still merely a hack since casting simply silences the type error without doing anything to ensure the data has the expected type.

x: str = typing.cast(str, 123) # obviously broken but no error

x.lower()
# ‼️ AttributeError: 'int' object has no attribute 'lower'

The best possible solution is to somehow do a type-check at runtime.

The requirements are:

  1. use the standard typing type annotations,
  2. verify that data is compatible with the expected type,
  3. not change the data itself.

This eliminates existing libraries like Pydantic and marshmallow which require using custom definitions for data models/schemas. Both these libraries also focus on more general issues of data validation (e.g. conditions like x > 10), serialization and parsing, etc. We only care about types.

Possible solutions:

Evaluating Our Options

Let's test these runtime type checkers. Consider the example of an API that returns "any valid JSON" as metadata in its response:

// curl https://example.com/user/12345
{
  "id": 12345,
  "name": "John Doe",

  // this is arbitrary JSON
  "metadata": {
    "number": 111,
    "note": "hello world",
    "todo": ["buy milk", "write book"]
    // ...
  }
}

We try define the Python type for user["metadata"] and get NameErrors:

JsonValue = (
  # supported primitive values
  int | float | str | bool | None |

  List[JsonValue] | # ‼️ NameError: name 'JsonValue' is not defined
  Dict[str, JsonValue] # ‼️ NameError: name 'JsonValue' is not defined
)

To fix them we use string annotations:2

JsonValue = (
  # supported primitive values
  int | float | str | bool | None |

  List["JsonValue"] |
  Dict[str, "JsonValue"]
)

Now, we can try to use JsonValue with our typecheckers.


When we test in a context where JsonValue is not available by its original name, Typeguard breaks:

from json_type import JsonValue as JsonVal

# Note that "JsonValue" is not in scope since we renamed it:
# >>> print(JsonValue)
# NameError: name 'JsonValue' is not defined

x = typeguard.check_type([set()], JsonVal)
# TypeHintWarning: Cannot resolve forward reference 'JsonValue'

# The type check does not prevent a downstream error:
print(json.dumps(x))
# ‼️‼️‼️
# TypeError: Object of type set is not JSON serializable
# ‼️‼️‼️

Beartype breaks in a similar way.
So do the standard library utilities.


Here is a rough outline of what happens:

  1. When we used string annotations, we introduced a new type, typing.ForwardRef:

    >>> JsonValue
    typing.Union[
      int, float, str, bool, NoneType,
      typing.List[ForwardRef('JsonValue')],
      typing.Dict[str, ForwardRef('JsonValue')]
    ]
    
  2. The type checker starts to type check our value [set()] by trying to match it with JsonValue i.e. typing.Union[int, float, ...].

  3. Since [set()] is a list, it matches the typing.List[...] type, and the type checker recurses on the elements.

  4. The first element is set() and we must match it with ForwardRef('JsonValue') which is the type of the elements specified in our typing.List.

  5. The ForwardRef says to look for a type named JsonValue, so we look for JsonValue in the current scope and don't find it. Remember that we imported it renamed as JsonVal.

  6. At this point Beartype and the standard library throw an error and Typeguard silently ignores the type and prints a warning.

To fix this, we need to find a way to reliably resolve typing.ForwardRef to its target, regardless of what is available in scope. This is our ultimate goal.

Investigating


Examining the ForwardRef value:

>>> list_type = typing.get_args(json_type.JsonValue)[-2]; list_type
typing.List[ForwardRef('JsonValue')]

>>> fr = typing.get_args(list_type)[0]; fr
ForwardRef('JsonValue')

>>> pprint({k: getattr(fr, k) for k in dir(fr)})
{'__forward_arg__': 'JsonValue',
 '__forward_code__': <code object <module> at 0x10a1ebab0, file "<string>", line 1>,
 '__forward_evaluated__': False,
 '__forward_is_argument__': True,
 '__forward_is_class__': False,
 '__forward_module__': None,
 '__forward_value__': None,
 '_evaluate': <bound method ForwardRef._evaluate of ForwardRef('JsonValue')>,

 # unimportant attributes omitted
 ...}

Of particular interest is the _evaluate method as its name suggests that it might be related. Unfortunately, it is not as simple as calling it:

>>> help(fr._evaluate)
Help on method _evaluate in module typing:

_evaluate(globalns, localns, recursive_guard) method of typing.ForwardRef instance
>>> fr._evaluate(globals(), locals(), set())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python3.12/typing.py", line 900, in _evaluate
    eval(self.__forward_code__, globalns, localns),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
NameError: name 'JsonValue' is not defined

We need to somehow come up with the globals() and locals() that have the referred-to type available. Ideally, these are the same namespaces as at the point of definition of the type.

Monkey-patching ForwardRef

Since we get instances of the ForwardRef class, its __init__ method must be called at some point:

og = ForwardRef.__init__
def tracer(self, *args, **kwargs):
    print(args, kwargs)
    return og(self, *args, **kwargs)
ForwardRef.__init__ = tracer

JsonValue: TypeAlias = (
    int | float | str | bool | None | List["JsonValue"] | Dict[str, "JsonValue"]
)

# ('JsonValue',) {'module': None, 'is_class': False}
# ('JsonValue',) {'module': None, 'is_class': False}

Now, as all the best Python hacks go, we break out sys._getframe():3

def tracer(self, *args, **kwargs):
    global frame
    frame = sys._getframe().f_back  # get caller frame
    return og(self, *args, **kwargs)
>>> frame.f_code.co_filename
'.../lib/python3.12/typing.py'
>>> frame.f_back.f_back.f_back.f_back.f_back.f_code.co_filename
'.../python_resolving_forwardrefs/json_type.py'
>>> top_fr = frame.f_back.f_back.f_back.f_back.f_back
>>> pprint(top_fr.f_globals)
{'JsonValue': typing.Union[int, float, str, bool, NoneType, typing.List[ForwardRef('JsonValue')], typing.Dict[str, ForwardRef('JsonValue')]],
 ...}
>>> fr._evaluate(top_fr.f_globals, top_fr.f_locals, set())
typing.Union[int, float, str, bool, NoneType, typing.List[ForwardRef('JsonValue')], typing.Dict[str, ForwardRef('JsonValue')]]

Getting from here to a "production-ready" hack is relatively straightforward:

# to avoid polluting the class more than necessary,
# store all new state out-of-band
forward_frames: dict[int, FrameType] = {}

real_init = ForwardRef.__init__

def init(self, *args, **kwargs):
    cur = sys._getframe().f_back
    assert cur is not None

    typing_filename = cur.f_code.co_filename
    while cur is not None and cur.f_code.co_filename == typing_filename:
        cur = cur.f_back

    if cur is not None:
        forward_frames[id(self)] = cur
    real_init(self, *args, **kwargs)

ForwardRef.__init__ = init

Usage examples and a toy type-checker using the above code are available in the code repository.

Epilog

Python 3.12 avoided this issue in the new type JsonValue = ... (PEP 695) aliases. TypeAlias is being deprecated. The new syntax does not create typing.ForwardRef at all, instead the fully resolved target is stored in typing.TypeAliasType.__value__.

This is great!

Unfortunately, neither beartype nor typeguard support it yet.
Mypy also lacks support.
Pyright commendably introduced support in May 2023.

Rant (Deprecated in Python 3.12)

I'm leaving the following rant as-is for the following reasons:


It simply should not be this hard to deal with Python's typing.

A simple task—reading external data—carries a gotcha in the standard library. Everything works as expected unless a very specific set of circumstances arises. Maybe these circumstances are rare, but they are entirely plausible. The motivation for this hack came from production code not an academic exercise.

The specific conditions that trigger the problem are hard to explain and inconsistent between type checker implementations:

Is it any wonder that adoption has been slow? When even modern projects dedicated to type-checking get it wrong, the temptation is to just typing.cast away all problems.


To turn this around, future PEPs should:

  1. strive for parity between runtime and static type-checking,
  2. aim for consistency above all,
  3. not accept arbitrary limitations on standard library types and functions.,
  4. aim to provide standard ways to interact with typing, ideally a reference type-checker implementation that is usable at runtime.

Appendix

from __future__ import annotations (PEP 563)

Deferred annotation evaluation does nothing for us because the alias is defined in regular Python code and not an annotation:

from __future__ import annotations

class List1:
  child: List1 | None # PEP 563 makes this reference OK

List2 = List2 | None # regular Python variable, NOT an annotation
# ‼️ NameError: name 'List2' is not defined.

Python 3.9 Generic Standard Collections

Using the standard collections as generic types (PEP 585) does not produce a ForwardRef. It leaves the value as a str:

>>> typing.get_args(list["Test"])
('Test',)
>>> type(typing.get_args(list["Test"])[0])
<class 'str'>

A quote from the standard library docs supports this:

Note: PEP 585 generic types such as list["SomeClass"] will not be implicitly transformed into list[ForwardRef("SomeClass")] and thus will not automatically resolve to list[SomeClass].

Built-in classes like list and str cannot be patched.4 This means that this hack is deprecated since 3.9 with no alternative. It might be possible to devise an even bigger hack, but for now I suggest simply using typing.List etc. when required.

Breaking Beartype

  1. Beartype with sometimes throw BeartypeCallHintForwardRefException: Forward reference 'fn.JsonValue' referent ... not class. when using JsonValue so we have to use a different type.
  2. Need to use a function because we only get a decorator.
# test_type.py
class TestType: ...
ArrayOfTest: TypeAlias = List["TestType"]

# fn.py
from test_type import ArrayOfTest, TestType as TT

@beartype
def test(x: ArrayOfTest): ...

test([TT()])
# ‼️‼️‼️
# beartype.roar.BeartypeCallHintForwardRefException:
# Forward reference "fn.TestType" unimportable
# ‼️‼️‼️

typing.get_type_hints

The standard library documentation explicitly disclaims support for our usage of ForwardRef in the utility method get_type_hints:

Note: get_type_hints() does not work with imported type aliases that include forward references. Enabling postponed evaluation of annotations (PEP 563) may remove the need for most forward references.

Python 3.10 Union Inconsistency

The new way of writing unions introduced in PEP604 does not match typing.Union and gives inconsistent results depending on the type arguments:

>>> int | list[str]
int | list[str]

>>> type(int | list[str])
<class 'types.UnionType'>
# not the same as `typing.Union`?

>>> int | typing.List[str]
typing.Union[int, typing.List[str]]
# why is this `typing.Union` now?

>>> # typing.Union is consistent:
>>> typing.Union[int, list[str]]
typing.Union[int, list[str]]
>>> typing.Union[int, typing.List[str]]
typing.Union[int, typing.List[str]]

Reproducibility

Code repository available on GitHub

All experiments were done with

Footnotes

  1. https://www.python.org/downloads/release/python-350/

  2. This has been deprecated in Python 3.12, the new syntax is type JsonValue = ... (PEP 695). More on this in the epilog.

  3. And break compatibility with some Python runtimes! sys._getframe is only predictable in the default CPython implementation.

  4. Not strictly true, you can abuse C FFI to modify the method tables: https://pypi.org/project/forbiddenfruit/

#hacks #programming #python #typing