PEP: 681
Title: Data Class Transforms
Author: Erik De Bonte <erikd at microsoft.com>,
        Eric Traut <erictr at microsoft.com>
Sponsor: Jelle Zijlstra <jelle.zijlstra at gmail.com>
Discussions-To: https://mail.python.org/archives/list/typing-sig@python.org/thread/EAALIHA3XEDFDNG2NRXTI3ERFPAD65Z4/
Status: Accepted
Type: Standards Track
Content-Type: text/x-rst
Created: 02-Dec-2021
Python-Version: 3.11
Post-History: `24-Apr-2021 <https://mail.python.org/archives/list/typing-sig@python.org/thread/TXL5LEHYX5ZJAZPZ7YHZU7MVFXMVUVWL/>`__,
              `13-Dec-2021 <https://mail.python.org/archives/list/typing-sig@python.org/thread/EAALIHA3XEDFDNG2NRXTI3ERFPAD65Z4/>`__,
              `22-Feb-2022 <https://mail.python.org/archives/list/typing-sig@python.org/thread/BW6CB6URC4BCN54QSG2STINU2M7V4TQQ/>`__
Resolution: https://mail.python.org/archives/list/python-dev@python.org/message/R4A2IYLGFHKFDYJPSDA5NFJ6N7KRPJ6D/


Abstract
========

:pep:`557` introduced the dataclass to the Python stdlib. Several popular
libraries have behaviors that are similar to dataclasses, but these
behaviors cannot be described using standard type annotations. Such
projects include attrs, pydantic, and object relational mapper (ORM)
packages such as SQLAlchemy and Django.

Most type checkers, linters and language servers have full support for
dataclasses. This proposal aims to generalize this functionality and
provide a way for third-party libraries to indicate that certain
decorator functions, classes, and metaclasses provide behaviors
similar to dataclasses.

These behaviors include:

* Synthesizing an ``__init__`` method based on declared
  data fields.
* Optionally synthesizing ``__eq__``, ``__ne__``, ``__lt__``,
  ``__le__``, ``__gt__`` and ``__ge__`` methods.
* Supporting "frozen" classes, a way to enforce immutability during
  static type checking.
* Supporting "field specifiers", which describe attributes of
  individual fields that a static type checker must be aware of,
  such as whether a default value is provided for the field.

The full behavior of the stdlib dataclass is described in the `Python
documentation <#dataclass-docs_>`_.

This proposal does not affect CPython directly except for the addition
of a ``dataclass_transform`` decorator in ``typing.py``.


Motivation
==========

There is no existing, standard way for libraries with dataclass-like
semantics to declare their behavior to type checkers. To work around
this limitation, Mypy custom plugins have been developed for many of
these libraries, but these plugins don't work with other type
checkers, linters or language servers. They are also costly to
maintain for library authors, and they require that Python developers
know about the existence of these plugins and download and configure
them within their environment.


Rationale
=========

The intent of this proposal is not to support every feature of every
library with dataclass-like semantics, but rather to make it possible
to use the most common features of these libraries in a way that is
compatible with static type checking. If a user values these libraries
and also values static type checking, they may need to avoid using
certain features or make small adjustments to the way they use them.
That's already true for the Mypy custom plugins, which
don't support every feature of every dataclass-like library.

As new features are added to dataclasses in the future, we intend, when
appropriate, to add support for those features on
``dataclass_transform`` as well. Keeping these two feature sets in
sync will make it easier for dataclass users to understand and use
``dataclass_transform`` and will simplify the maintenance of dataclass
support in type checkers.

Additionally, we will consider adding ``dataclass_transform`` support
in the future for features that have been adopted by multiple
third-party libraries but are not supported by dataclasses.


Specification
=============

The ``dataclass_transform`` decorator
-------------------------------------

This specification introduces a new decorator function in
the ``typing`` module named ``dataclass_transform``. This decorator
can be applied to either a function that is itself a decorator,
a class, or a metaclass. The presence of
``dataclass_transform`` tells a static type checker that the decorated
function, class, or metaclass performs runtime "magic" that transforms
a class, endowing it with dataclass-like behaviors.

If ``dataclass_transform`` is applied to a function, using the decorated
function as a decorator is assumed to apply dataclass-like semantics.
If the function has overloads, the ``dataclass_transform`` decorator can
be applied to the implementation of the function or any one, but not more
than one, of the overloads. When applied to an overload, the
``dataclass_transform`` decorator still impacts all usage of the
function.

If ``dataclass_transform`` is applied to a class, dataclass-like
semantics will be assumed for any class that derives from the
decorated class or uses the decorated class as a metaclass.

Examples of each approach are shown in the following sections. Each
example creates a ``CustomerModel`` class with dataclass-like semantics.
The implementation of the decorated objects is omitted for brevity,
but we assume that they modify classes in the following ways:

* They synthesize an ``__init__`` method using data fields declared
  within the class and its parent classes.
* They synthesize ``__eq__`` and ``__ne__`` methods.

Type checkers supporting this PEP will recognize that the
``CustomerModel`` class can be instantiated using the synthesized
``__init__`` method:

.. code-block:: python

  # Using positional arguments
  c1 = CustomerModel(327, "John Smith")

  # Using keyword arguments
  c2 = CustomerModel(id=327, name="John Smith")

  # These calls will generate runtime errors and should be flagged as
  # errors by a static type checker.
  c3 = CustomerModel()
  c4 = CustomerModel(327, first_name="John")
  c5 = CustomerModel(327, "John Smith", 0)

Decorator function example
''''''''''''''''''''''''''

.. code-block:: python

  _T = TypeVar("_T")
  
  # The ``create_model`` decorator is defined by a library.
  # This could be in a type stub or inline.
  @typing.dataclass_transform()
  def create_model(cls: Type[_T]) -> Type[_T]:
      cls.__init__ = ...
      cls.__eq__ = ...
      cls.__ne__ = ...
      return cls
  
  # The ``create_model`` decorator can now be used to create new model
  # classes, like this:
  @create_model
  class CustomerModel:
      id: int
      name: str

Class example
'''''''''''''

.. code-block:: python

  # The ``ModelBase`` class is defined by a library. This could be in
  # a type stub or inline.
  @typing.dataclass_transform()
  class ModelBase: ...

  # The ``ModelBase`` class can now be used to create new model
  # subclasses, like this:
  class CustomerModel(ModelBase):
      id: int
      name: str

Metaclass example
'''''''''''''''''

.. code-block:: python

  # The ``ModelMeta`` metaclass and ``ModelBase`` class are defined by
  # a library. This could be in a type stub or inline.
  @typing.dataclass_transform()
  class ModelMeta(type): ...
  
  class ModelBase(metaclass=ModelMeta): ...
  
  # The ``ModelBase`` class can now be used to create new model
  # subclasses, like this:
  class CustomerModel(ModelBase):
      id: int
      name: str

Decorator function and class/metaclass parameters
-------------------------------------------------

A decorator function, class, or metaclass that provides dataclass-like
functionality may accept parameters that modify certain behaviors.
This specification defines the following parameters that static type
checkers must honor if they are used by a dataclass transform. Each of
these parameters accepts a bool argument, and it must be possible for
the bool value (``True`` or ``False``) to be statically evaluated.

* ``eq``.  ``order``, ``frozen``, ``init`` and ``unsafe_hash`` are parameters
  supported in the stdlib dataclass, with meanings defined in 
  :pep:`PEP 557 <557#id7>`.
* ``kw_only``, ``match_args`` and ``slots`` are parameters supported
  in the stdlib dataclass, first introduced in Python 3.10.

``dataclass_transform`` parameters
----------------------------------

Parameters to ``dataclass_transform`` allow for some basic
customization of default behaviors:

.. code-block:: python

  _T = TypeVar("_T")
  
  def dataclass_transform(
      *,
      eq_default: bool = True,
      order_default: bool = False,
      kw_only_default: bool = False,
      field_specifiers: tuple[type | Callable[..., Any], ...] = (),
      **kwargs: Any,
  ) -> Callable[[_T], _T]: ...

* ``eq_default`` indicates whether the ``eq`` parameter is assumed to
  be True or False if it is omitted by the caller. If not specified,
  ``eq_default`` will default to True (the default assumption for
  dataclass).
* ``order_default`` indicates whether the ``order`` parameter is
  assumed to be True or False if it is omitted by the caller. If not
  specified, ``order_default`` will default to False (the default
  assumption for dataclass).
* ``kw_only_default`` indicates whether the ``kw_only`` parameter is
  assumed to be True or False if it is omitted by the caller. If not
  specified, ``kw_only_default`` will default to False (the default
  assumption for dataclass).
* ``field_specifiers`` specifies a static list of supported classes
  that describe fields. Some libraries also supply functions to
  allocate instances of field specifiers, and those functions may
  also be specified in this tuple. If not specified,
  ``field_specifiers`` will default to an empty tuple (no field
  specifiers supported). The standard dataclass behavior supports
  only one type of field specifier called ``Field`` plus a helper
  function (``field``) that instantiates this class, so if we were
  describing the stdlib dataclass behavior, we would provide the
  tuple argument ``(dataclasses.Field, dataclasses.field)``.
* ``kwargs`` allows arbitrary additional keyword args to be passed to
  ``dataclass_transform``. This gives type checkers the freedom to
  support experimental parameters without needing to wait for changes
  in ``typing.py``. Type checkers should report errors for any
  unrecognized parameters.

In the future, we may add additional parameters to
``dataclass_transform`` as needed to support common behaviors in user
code. These additions will be made after reaching consensus on
typing-sig rather than via additional PEPs.

The following sections provide additional examples showing how these
parameters are used.

Decorator function example
''''''''''''''''''''''''''

.. code-block:: python

  # Indicate that the ``create_model`` function assumes keyword-only
  # parameters for the synthesized ``__init__`` method unless it is
  # invoked with ``kw_only=False``. It always synthesizes order-related
  # methods and provides no way to override this behavior.
  @typing.dataclass_transform(kw_only_default=True, order_default=True)
  def create_model(
      *,
      frozen: bool = False,
      kw_only: bool = True,
  ) -> Callable[[Type[_T]], Type[_T]]: ...
  
  # Example of how this decorator would be used by code that imports
  # from this library:
  @create_model(frozen=True, kw_only=False)
  class CustomerModel:
      id: int
      name: str

Class example
'''''''''''''

.. code-block:: python

  # Indicate that classes that derive from this class default to
  # synthesizing comparison methods.
  @typing.dataclass_transform(eq_default=True, order_default=True)
  class ModelBase:
      def __init_subclass__(
          cls,
          *,
          init: bool = True,
          frozen: bool = False,
          eq: bool = True,
          order: bool = True,
      ):
          ...
  
  # Example of how this class would be used by code that imports
  # from this library:
  class CustomerModel(
      ModelBase,
      init=False,
      frozen=True,
      eq=False,
      order=False,
  ):
      id: int
      name: str

Metaclass example
'''''''''''''''''

.. code-block:: python

  # Indicate that classes that use this metaclass default to
  # synthesizing comparison methods.
  @typing.dataclass_transform(eq_default=True, order_default=True)
  class ModelMeta(type):
      def __new__(
          cls,
          name,
          bases,
          namespace,
          *,
          init: bool = True,
          frozen: bool = False,
          eq: bool = True,
          order: bool = True,
      ):
          ...
  
  class ModelBase(metaclass=ModelMeta):
      ...
  
  # Example of how this class would be used by code that imports
  # from this library:
  class CustomerModel(
      ModelBase,
      init=False,
      frozen=True,
      eq=False,
      order=False,
  ):
      id: int
      name: str


Field specifiers
-----------------

Most libraries that support dataclass-like semantics provide one or
more "field specifier" types that allow a class definition to provide
additional metadata about each field in the class. This metadata can
describe, for example, default values, or indicate whether the field
should be included in the synthesized ``__init__`` method.

Field specifiers can be omitted in cases where additional metadata is
not required:

.. code-block:: python

  @dataclass
  class Employee:
      # Field with no specifier
      name: str
  
      # Field that uses field specifier class instance
      age: Optional[int] = field(default=None, init=False)
  
      # Field with type annotation and simple initializer to
      # describe default value
      is_paid_hourly: bool = True
  
      # Not a field (but rather a class variable) because type
      # annotation is not provided.
      office_number = "unassigned"


Field specifier parameters
'''''''''''''''''''''''''''

Libraries that support dataclass-like semantics and support field
specifier classes typically use common parameter names to construct
these field specifiers. This specification formalizes the names and
meanings of the parameters that must be understood for static type
checkers. These standardized parameters must be keyword-only.

These parameters are a superset of those supported by
``dataclasses.field``, excluding those that do not have an impact on
type checking such as ``compare`` and ``hash``.

Field specifier classes are allowed to use other
parameters in their constructors, and those parameters can be
positional and may use other names.

* ``init`` is an optional bool parameter that indicates whether the
  field should be included in the synthesized ``__init__`` method. If
  unspecified, ``init`` defaults to True. Field specifier functions
  can use overloads that implicitly specify the value of ``init``
  using a literal bool value type
  (``Literal[False]`` or ``Literal[True]``).
* ``default`` is an optional parameter that provides the default value
  for the field.
* ``default_factory`` is an optional parameter that provides a runtime
  callback that returns the default value for the field. If neither
  ``default`` nor ``default_factory`` are specified, the field is
  assumed to have no default value and must be provided a value when
  the class is instantiated.
* ``factory`` is an alias for ``default_factory``. Stdlib dataclasses
  use the name ``default_factory``, but attrs uses the name ``factory``
  in many scenarios, so this alias is necessary for supporting attrs.
* ``kw_only`` is an optional bool parameter that indicates whether the
  field should be marked as keyword-only. If true, the field will be
  keyword-only. If false, it will not be keyword-only. If unspecified,
  the value of the ``kw_only`` parameter on the object decorated with
  ``dataclass_transform`` will be used, or if that is unspecified, the
  value of ``kw_only_default`` on ``dataclass_transform`` will be used.
* ``alias`` is an optional str parameter that provides an alternative
  name for the field. This alternative name is used in the synthesized
  ``__init__`` method.

It is an error to specify more than one of ``default``,
``default_factory`` and ``factory``.

This example demonstrates the above:

.. code-block:: python

  # Library code (within type stub or inline)
  # In this library, passing a resolver means that init must be False,
  # and the overload with Literal[False] enforces that.
  @overload
  def model_field(
          *,
          default: Optional[Any] = ...,
          resolver: Callable[[], Any],
          init: Literal[False] = False,
      ) -> Any: ...
  
  @overload
  def model_field(
          *,
          default: Optional[Any] = ...,
          resolver: None = None,
          init: bool = True,
      ) -> Any: ...
  
  @typing.dataclass_transform(
      kw_only_default=True,
      field_specifiers=(model_field, ))
  def create_model(
      *,
      init: bool = True,
  ) -> Callable[[Type[_T]], Type[_T]]: ...
  
  # Code that imports this library:
  @create_model(init=False)
  class CustomerModel:
      id: int = model_field(resolver=lambda : 0)
      name: str


Runtime behavior
----------------

At runtime, the ``dataclass_transform`` decorator's only effect is to
set an attribute named ``__dataclass_transform__`` on the decorated
function or class to support introspection. The value of the attribute
should be a dict mapping the names of the ``dataclass_transform``
parameters to their values.

For example:

.. code-block:: python

  {
    "eq_default": True,
    "order_default": False,
    "kw_only_default": False,
    "field_specifiers": (),
    "kwargs": {}
  }


Dataclass semantics
-------------------

Except where stated otherwise in this PEP, classes impacted by
``dataclass_transform``, either by inheriting from a class that is
decorated with ``dataclass_transform`` or by being decorated with
a function decorated with ``dataclass_transform``, are assumed to
behave like stdlib ``dataclass``.

This includes, but is not limited to, the following semantics:

* Frozen dataclasses cannot inherit from non-frozen dataclasses. A
  class that has been decorated with ``dataclass_transform`` is
  considered neither frozen nor non-frozen, thus allowing frozen
  classes to inherit from it. Similarly, a class that directly
  specifies a metaclass that is decorated with ``dataclass_transform``
  is considered neither frozen nor non-frozen.

  Consider these class examples:
   
  .. code-block:: python

    # ModelBase is not considered either "frozen" or "non-frozen"
    # because it is decorated with ``dataclass_transform``
    @typing.dataclass_transform()
    class ModelBase(): ...

    # Vehicle is considered non-frozen because it does not specify
    # "frozen=True".
    class Vehicle(ModelBase):
        name: str

    # Car is a frozen class that derives from Vehicle, which is a
    # non-frozen class. This is an error.
    class Car(Vehicle, frozen=True):
        wheel_count: int

  And these similar metaclass examples:
   
  .. code-block:: python

    @typing.dataclass_transform()
    class ModelMeta(type): ...

    # ModelBase is not considered either "frozen" or "non-frozen"
    # because it directly specifies ModelMeta as its metaclass.
    class ModelBase(metaclass=ModelMeta): ...

    # Vehicle is considered non-frozen because it does not specify
    # "frozen=True".
    class Vehicle(ModelBase):
        name: str

    # Car is a frozen class that derives from Vehicle, which is a
    # non-frozen class. This is an error.
    class Car(Vehicle, frozen=True):
        wheel_count: int

* Field ordering and inheritance is assumed to follow the rules
  specified in :pep:`557 <557#inheritance>`. This includes the effects of
  overrides (redefining a field in a child class that has already been
  defined in a parent class).

* :pep:`PEP 557 indicates <557#post-init-parameters>` that
  all fields without default values must appear before
  fields with default values. Although not explicitly
  stated in PEP 557, this rule is ignored when ``init=False``, and
  this specification likewise ignores this requirement in that
  situation. Likewise, there is no need to enforce this ordering when
  keyword-only parameters are used for ``__init__``, so the rule is
  not enforced if ``kw_only`` semantics are in effect.

* As with dataclass, method synthesis is skipped if it would
  overwrite a method that is explicitly declared within the class.
  For example, if a class declares an ``__init__`` method explicitly,
  an ``__init__`` method will not be synthesized for that class.

* KW_ONLY sentinel values are supported as described in `the Python
  docs <#kw-only-docs_>`_ and `bpo-43532 <#kw-only-issue_>`_.

* ClassVar attributes are not considered dataclass fields and are
  `ignored by dataclass mechanisms <#class-var_>`_.


Undefined behavior
------------------

If multiple ``dataclass_transform`` decorators are found, either on a
single function (including its overloads), a single class, or within a
class hierarchy, the resulting behavior is undefined. Library authors
should avoid these scenarios.


Reference Implementation
========================

`Pyright <#pyright_>`_ contains the reference implementation of type
checker support for ``dataclass_transform``. Pyright's
``dataClasses.ts`` `source file <#pyright-impl_>`_ would be a good
starting point for understanding the implementation.

The `attrs <#attrs-usage_>`_ and `pydantic <#pydantic-usage_>`_
libraries are using ``dataclass_transform`` and serve as real-world
examples of its usage.


Rejected Ideas
==============

``auto_attribs`` parameter
--------------------------

The attrs library supports an ``auto_attribs`` parameter that
indicates whether class members decorated with :pep:`526` variable
annotations but with no assignment should be treated as data fields.

We considered supporting ``auto_attribs`` and a corresponding
``auto_attribs_default`` parameter, but decided against this because it
is specific to attrs.

Django does not support declaring fields using type annotations only,
so Django users who leverage ``dataclass_transform`` should be aware
that they should always supply assigned values.

``cmp`` parameter
-----------------

The attrs library supports a bool parameter ``cmp`` that is equivalent
to setting both ``eq`` and ``order`` to True. We chose not to support
a ``cmp`` parameter, since it only applies to attrs. Users can emulate
the ``cmp`` behaviour by using the ``eq`` and ``order`` parameter names
instead.

Automatic field name aliasing
-----------------------------

The attrs library performs `automatic aliasing <#attrs-aliasing_>`_ of
field names that start with a single underscore, stripping the
underscore from the name of the corresponding ``__init__`` parameter.

This proposal omits that behavior since it is specific to attrs. Users
can manually alias these fields using the ``alias`` parameter.

Alternate field ordering algorithms
-----------------------------------

The attrs library currently supports two approaches to ordering the
fields within a class:

* Dataclass order: The same ordering used by dataclasses. This is the
  default behavior of the older APIs (e.g. ``attr.s``).
* Method Resolution Order (MRO): This is the default behavior of the
  newer APIs (e.g. define, mutable, frozen). Older APIs (e.g. ``attr.s``)
  can opt into this behavior by specifying ``collect_by_mro=True``.

The resulting field orderings can differ in certain diamond-shaped
multiple inheritance scenarios.

For simplicity, this proposal does not support any field ordering
other than that used by dataclasses.

Fields redeclared in subclasses
-------------------------------

The attrs library differs from stdlib dataclasses in how it
handles inherited fields that are redeclared in subclasses. The
dataclass specification preserves the original order, but attrs
defines a new order based on subclasses.

For simplicity, we chose to only support the dataclass behavior.
Users of attrs who rely on the attrs-specific ordering will not see
the expected order of parameters in the synthesized ``__init__``
method.

Django primary and foreign keys
-------------------------------

Django applies `additional logic for primary and foreign keys
<#django-ids_>`_. For example, it automatically adds an ``id`` field
(and ``__init__`` parameter) if there is no field designated as a
primary key.

As this is not broadly applicable to dataclass libraries, this
additional logic is not accommodated with this proposal, so
users of Django would need to explicitly declare the ``id`` field.

Class-wide default values
-------------------------

SQLAlchemy requested that we expose a way to specify that the default
value of all fields in the transformed class is ``None``. It is typical
that all SQLAlchemy fields are optional, and ``None`` indicates that
the field is not set.

We chose not to support this feature, since it is specific to
SQLAlchemy. Users can manually set ``default=None`` on these fields
instead.

Descriptor-typed field support
------------------------------

We considered adding a boolean parameter on ``dataclass_transform``
to enable better support for fields with descriptor types, which is
common in SQLAlchemy. When enabled, the type of each parameter on the
synthesized ``__init__`` method corresponding to a descriptor-typed
field would be the type of the value parameter to the descriptor's
``__set__`` method rather than the descriptor type itself. Similarly,
when setting the field, the ``__set__`` value type would be expected.
And when getting the value of the field, its type would be expected to
match the return type of ``__get__``.

This idea was based on the belief that ``dataclass`` did not properly
support descriptor-typed fields. In fact it does, but type checkers
(at least mypy and pyright) did not reflect the runtime behavior which
led to our misunderstanding. For more details, see the
`Pyright bug <#pyright-descriptor-bug_>`__.

``converter`` field specifier parameter
----------------------------------------

The attrs library supports a ``converter`` field specifier parameter,
which is a ``Callable`` that is called by the generated
``__init__`` method to convert the supplied value to some other
desired value. This is tricky to support since the parameter type in
the synthesized ``__init__`` method needs to accept uncovered values,
but the resulting field is typed according to the output of the
converter.

Some aspects of this issue are detailed in a
`Pyright discussion <#converters_>`_.

There may be no good way to support this because there's not enough
information to derive the type of the input parameter. One possible
solution would be to add support for a ``converter`` field specifier
parameter but then use the ``Any`` type for the corresponding
parameter in the ``__init__`` method.


References
==========
.. _#dataclass-docs: https://docs.python.org/3.11/library/dataclasses.html
.. _#pyright: https://github.com/Microsoft/pyright
.. _#pyright-impl: https://github.com/microsoft/pyright/blob/main/packages/pyright-internal/src/analyzer/dataClasses.ts
.. _#attrs-usage: https://github.com/python-attrs/attrs/pull/796
.. _#pydantic-usage: https://github.com/samuelcolvin/pydantic/pull/2721
.. _#attrs-aliasing: https://www.attrs.org/en/stable/init.html#private-attributes
.. _#django-ids: https://docs.djangoproject.com/en/4.0/topics/db/models/#automatic-primary-key-fields
.. _#converters: https://github.com/microsoft/pyright/discussions/1782?sort=old#discussioncomment-653909
.. _#kw-only-docs: https://docs.python.org/3/library/dataclasses.html#dataclasses.KW_ONLY
.. _#kw-only-issue: https://bugs.python.org/issue43532
.. _#class-var: https://docs.python.org/3/library/dataclasses.html#class-variables
.. _#pyright-descriptor-bug: https://github.com/microsoft/pyright/issues/3245


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.