PEP: 520
Title: Preserving Class Attribute Definition Order
Version: $Revision$
Last-Modified: $Date$
Author: Eric Snow <ericsnowcurrently@gmail.com>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 07-Jun-2016
Python-Version: 3.6
Post-History: 07-Jun-2016, 11-Jun-2016, 20-Jun-2016, 24-Jun-2016
Resolution: https://mail.python.org/pipermail/python-dev/2016-June/145442.html

.. note::
   Since compact dict has landed in 3.6, __definition_order__
   has been removed.  ``cls.__dict__`` now mostly accomplishes the same
   thing instead.

Abstract
========

The class definition syntax is ordered by its very nature. Class
attributes defined there are thus ordered.  Aside from helping with
readability, that ordering is sometimes significant.  If it were
automatically available outside the class definition then the
attribute order could be used without the need for extra boilerplate
(such as metaclasses or manually enumerating the attribute order).
Given that this information already exists, access to the definition
order of attributes is a reasonable expectation.  However, currently
Python does not preserve the attribute order from the class
definition.

This PEP changes that by preserving the order in which attributes
are introduced in the class definition body.  That order will now be
preserved in the ``__definition_order__`` attribute of the class.
This allows introspection of the original definition order, e.g. by
class decorators.

Additionally, this PEP requires that the default class definition
namespace be ordered (e.g. ``OrderedDict``) by default.  The long-
lived class namespace (``__dict__``) will remain a ``dict``.


Motivation
==========

The attribute order from a class definition may be useful to tools
that rely on name order.  However, without the automatic availability
of the definition order, those tools must impose extra requirements on
users.  For example, use of such a tool may require that your class use
a particular metaclass.  Such requirements are often enough to
discourage use of the tool.

Some tools that could make use of this PEP include:

* documentation generators
* testing frameworks
* CLI frameworks
* web frameworks
* config generators
* data serializers
* enum factories (my original motivation)


Background
==========

When a class is defined using a ``class`` statement, the class body
is executed within a namespace.  Currently that namespace defaults to
``dict``.  If the metaclass defines ``__prepare__()`` then the result
of calling it is used for the class definition namespace.

After the execution completes, the definition namespace is
copied into a new ``dict``.  Then the original definition namespace is
discarded.  The new copy is stored away as the class's namespace and
is exposed as ``__dict__`` through a read-only proxy.

The class attribute definition order is represented by the insertion
order of names in the *definition* namespace.  Thus, we can have
access to the definition order by switching the definition namespace
to an ordered mapping, such as ``collections.OrderedDict``.  This is
feasible using a metaclass and ``__prepare__``, as described above.
In fact, exactly this is by far the most common use case for using
``__prepare__``.

At that point, the only missing thing for later access to the
definition order is storing it on the class before the definition
namespace is thrown away.  Again, this may be done using a metaclass.
However, this means that the definition order is preserved only for
classes that use such a metaclass.  There are two practical problems
with that:

First, it requires the use of a metaclass.  Metaclasses introduce an
extra level of complexity to code and in some cases (e.g. conflicts)
are a problem.  So reducing the need for them is worth doing when the
opportunity presents itself.  :pep:`422` and :pep:`487` discuss this at
length.  We have such an opportunity by using an ordered mapping (e.g.
``OrderedDict`` for CPython at least) for the default class definition
namespace, virtually eliminating the need for ``__prepare__()``.

Second, only classes that opt in to using the ``OrderedDict``-based
metaclass will have access to the definition order. This is problematic
for cases where universal access to the definition order is important.


Specification
=============

Part 1:

* all classes have a ``__definition_order__`` attribute
* ``__definition_order__`` is a ``tuple`` of identifiers (or ``None``)
* ``__definition_order__`` is always set:

  1. during execution of the class body, the insertion order of names
     into the class *definition* namespace is stored in a tuple
  2. if ``__definition_order__`` is defined in the class body then it
     must be a ``tuple`` of identifiers or ``None``; any other value
     will result in ``TypeError``
  3. classes that do not have a class definition (e.g. builtins) have
     their ``__definition_order__`` set to ``None``
  4. classes for which ``__prepare__()`` returned something other than
     ``OrderedDict`` (or a subclass) have their ``__definition_order__``
     set to ``None`` (except where #2 applies)

Not changing:

* ``dir()`` will not depend on ``__definition_order__``
* descriptors and custom ``__getattribute__`` methods are unconstrained
  regarding ``__definition_order__``

Part 2:

* the default class *definition* namespace is now an ordered mapping
  (e.g. ``OrderdDict``)
* ``cls.__dict__`` does not change, remaining a read-only proxy around
  ``dict``

Note that Python implementations which have an ordered ``dict`` won't
need to change anything.

The following code demonstrates roughly equivalent semantics for both
parts 1 and 2::

   class Meta(type):
       @classmethod
       def __prepare__(cls, *args, **kwargs):
           return OrderedDict()

   class Spam(metaclass=Meta):
       ham = None
       eggs = 5
       __definition_order__ = tuple(locals())

Why a tuple?
------------

Use of a tuple reflects the fact that we are exposing the order in
which attributes on the class were *defined*.  Since the definition
is already complete by the time ``__definition_order__`` is set, the
content and order of the value won't be changing.  Thus we use a type
that communicates that state of immutability.

Why not a read-only attribute?
------------------------------

There are some valid arguments for making ``__definition_order__``
a read-only attribute (like ``cls.__dict__`` is).  Most notably, a
read-only attribute conveys the nature of the attribute as "complete",
which is exactly correct for ``__definition_order__``.  Since it
represents the state of a particular one-time event (execution of
the class definition body), allowing the value to be replaced would
reduce confidence that the attribute corresponds to the original class
body.  Furthermore, often an immutable-by-default approach helps to
make data easier to reason about.

However, in this case there still isn't a *strong* reason to counter
the well-worn precedent found in Python.  Per Guido::

    I don't see why it needs to be a read-only attribute. There are
    very few of those -- in general we let users play around with
    things unless we have a hard reason to restrict assignment (e.g.
    the interpreter's internal state could be compromised). I don't
    see such a hard reason here.

Also, note that a writeable ``__definition_order__`` allows dynamically
created classes (e.g. by Cython) to still have ``__definition_order__``
properly set.  That could certainly be handled through specific class-
creation tools, such as ``type()`` or the C-API, without the need to
lose the semantics of a read-only attribute.  However, with a writeable
attribute it's a moot point.


Why not "__attribute_order__"?
------------------------------

``__definition_order__`` is centered on the class definition
body.  The use cases for dealing with the class namespace (``__dict__``)
post-definition are a separate matter.  ``__definition_order__`` would
be a significantly misleading name for a feature focused on more than
class definition.

Why not ignore "dunder" names?
------------------------------

Names starting and ending with "__" are reserved for use by the
interpreter.  In practice they should not be relevant to the users of
``__definition_order__``.  Instead, for nearly everyone they would only
be clutter, causing the same extra work (filtering out the dunder
names) for the majority.  In cases where a dunder name is significant,
the class definition *could* manually set ``__definition_order__``,
making the common case simpler.

However, leaving dunder names out of ``__definition_order__`` means
that their place in the definition order would be unrecoverably lost.
Dropping dunder names by default may inadvertently cause problems for
classes that use dunder names unconventionally.  In this case it's
better to play it safe and preserve *all* the names from the class
definition.  This isn't a big problem since it is easy to filter out
dunder names::

   (name for name in cls.__definition_order__
         if not (name.startswith('__') and name.endswith('__')))

In fact, in some application contexts there may be other criteria on
which similar filtering would be applied, such as ignoring any name
starting with "_", leaving out all methods, or including only
descriptors.  Ultimately dunder names aren't a special enough case to
be treated exceptionally.

Note that a couple of dunder names (``__name__`` and ``__qualname__``)
are injected by default by the compiler.  So they will be included even
though they are not strictly part of the class definition body.

Why None instead of an empty tuple?
-----------------------------------

A key objective of adding ``__definition_order__`` is to preserve
information in class definitions which was lost prior to this PEP.
One consequence is that ``__definition_order__`` implies an original
class definition.  Using ``None`` allows us to clearly distinguish
classes that do not have a definition order.  An empty tuple clearly
indicates a class that came from a definition statement but did not
define any attributes there.

Why None instead of not setting the attribute?
----------------------------------------------

The absence of an attribute requires more complex handling than ``None``
does for consumers of ``__definition_order__``.

Why constrain manually set values?
----------------------------------

If ``__definition_order__`` is manually set in the class body then it
will be used.  We require it to be a tuple of identifiers (or ``None``)
so that consumers of ``__definition_order__`` may have a consistent
expectation for the value.  That helps maximize the feature's
usefulness.

We could also allow an arbitrary iterable for a manually set
``__definition_order__`` and convert it into a tuple.  However, not
all iterables infer a definition order (e.g. ``set``).  So we opt in
favor of requiring a tuple.

Why not hide __definition_order__ on non-type objects?
------------------------------------------------------

Python doesn't make much effort to hide class-specific attributes
during lookup on instances of classes.  While it may make sense
to consider ``__definition_order__`` a class-only attribute, hidden
during lookup on objects, setting precedent in that regard is
beyond the goals of this PEP.

What about __slots__?
---------------------

``__slots__`` will be added to ``__definition_order__`` like any
other name in the class definition body.  The actual slot names
will not be added to ``__definition_order__`` since they aren't
set as names in the definition namespace.

Why is __definition_order__ even necessary?
-------------------------------------------

Since the definition order is not preserved in ``__dict__``, it is
lost once class definition execution completes.  Classes *could*
explicitly set the attribute as the last thing in the body.  However,
then independent decorators could only make use of classes that had done
so.  Instead, ``__definition_order__`` preserves this one bit of info
from the class body so that it is universally available.


Support for C-API Types
=======================

Arguably, most C-defined Python types (e.g. built-in, extension modules)
have a roughly equivalent concept of a definition order. So conceivably
``__definition_order__`` could be set for such types automatically. This
PEP does not introduce any such support. However, it does not prohibit
it either.  However, since ``__definition_order__`` can be set at any
time through normal attribute assignment, it does not need any special
treatment in the C-API.

The specific cases:

* builtin types
* PyType_Ready
* PyType_FromSpec


Compatibility
=============

This PEP does not break backward compatibility, except in the case that
someone relies *strictly* on ``dict`` as the class definition namespace.
This shouldn't be a problem since ``issubclass(OrderedDict, dict)`` is
true.


Changes
=============

In addition to the class syntax, the following expose the new behavior:

* builtins.__build_class__
* types.prepare_class
* types.new_class

Also, the 3-argument form of ``builtins.type()`` will allow inclusion
of ``__definition_order__`` in the namespace that gets passed in.  It
will be subject to the same constraints as when ``__definition_order__``
is explicitly defined in the class body.


Other Python Implementations
============================

Pending feedback, the impact on Python implementations is expected to
be minimal.  All conforming implementations are expected to set
``__definition_order__`` as described in this PEP.


Implementation
==============

The implementation is found in the
`tracker <https://github.com/python/cpython/issues/68442>`__.

Alternatives
============

An Order-preserving cls.__dict__
--------------------------------

Instead of storing the definition order in ``__definition_order__``,
the now-ordered definition namespace could be copied into a new
``OrderedDict``.  This would then be used as the mapping proxied as
``__dict__``.  Doing so would mostly provide the same semantics.

However, using ``OrderedDict`` for ``__dict__`` would obscure the
relationship with the definition namespace, making it less useful.

Additionally, (in the case of ``OrderedDict`` specifically) doing
this would require significant changes to the semantics of the
concrete ``dict`` C-API.

There has been some discussion about moving to a compact dict
implementation which would (mostly) preserve insertion order.  However
the lack of an explicit ``__definition_order__`` would still remain
as a pain point.

A "namespace" Keyword Arg for Class Definition
----------------------------------------------

:pep:`PEP 422 <422#order-preserving-classes>`
introduced a new "namespace" keyword arg to class definitions
that effectively replaces the need to ``__prepare__()``.
However, the proposal was withdrawn in favor of the simpler :pep:`487`.

A stdlib Metaclass that Implements __prepare__() with OrderedDict
-----------------------------------------------------------------

This has all the same problems as writing your own metaclass.  The
only advantage is that you don't have to actually write this
metaclass.  So it doesn't offer any benefit in the context of this
PEP.

Set __definition_order__ at Compile-time
----------------------------------------

Each class's ``__qualname__`` is determined at compile-time.
This same concept could be applied to ``__definition_order__``.
The result of composing ``__definition_order__`` at compile-time
would be nearly the same as doing so at run-time.

Comparative implementation difficulty aside, the key difference
would be that at compile-time it would not be practical to
preserve definition order for attributes that are set dynamically
in the class body (e.g. ``locals()[name] = value``).  However,
they should still be reflected in the definition order.  One
possible resolution would be to require class authors to manually
set ``__definition_order__`` if they define any class attributes
dynamically.

Ultimately, the use of ``OrderedDict`` at run-time or compile-time
discovery is almost entirely an implementation detail.


References
==========

* `Original discussion
  <https://mail.python.org/pipermail/python-ideas/2013-February/019690.html>`__

* `Follow-up 1
  <https://mail.python.org/pipermail/python-dev/2013-June/127103.html>`__

* `Follow-up 2
  <https://mail.python.org/pipermail/python-dev/2015-May/140137.html>`__

* `Nick Coghlan's concerns about mutability
  <https://mail.python.org/pipermail/python-dev/2016-June/144883.html>`__

Copyright
===========
This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: