PEP: 539
Title: A New C-API for Thread-Local Storage in CPython
Version: $Revision$
Last-Modified: $Date$
Author: Erik M. Bray, Masayuki Yamamoto
BDFL-Delegate: Nick Coghlan
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 20-Dec-2016
Python-Version: 3.7
Post-History: 16-Dec-2016, 31-Aug-2017, 08-Sep-2017
Resolution: https://mail.python.org/pipermail/python-dev/2017-September/149358.html

Abstract
========

The proposal is to add a new Thread Local Storage (TLS) API to CPython which
would supersede use of the existing TLS API within the CPython interpreter,
while deprecating the existing API.  The new API is named the "Thread
Specific Storage (TSS) API" (see `Rationale for Proposed Solution`_ for the
origin of the name).

Because the existing TLS API is only used internally (it is not mentioned in
the documentation, and the header that defines it, ``pythread.h``, is not
included in ``Python.h`` either directly or indirectly), this proposal
probably only affects CPython, but might also affect other interpreter
implementations (PyPy?) that implement parts of the CPython API.

This is motivated primarily by the fact that the old API uses ``int`` to
represent TLS keys across all platforms, which is neither POSIX-compliant,
nor portable in any practical sense [1]_.

.. note::

    Throughout this document the acronym "TLS" refers to Thread Local
    Storage and should not be confused with "Transportation Layer Security"
    protocols.


Specification
=============

The current API for TLS used inside the CPython interpreter consists of 6
functions::

    PyAPI_FUNC(int) PyThread_create_key(void)
    PyAPI_FUNC(void) PyThread_delete_key(int key)
    PyAPI_FUNC(int) PyThread_set_key_value(int key, void *value)
    PyAPI_FUNC(void *) PyThread_get_key_value(int key)
    PyAPI_FUNC(void) PyThread_delete_key_value(int key)
    PyAPI_FUNC(void) PyThread_ReInitTLS(void)

These would be superseded by a new set of analogous functions::

    PyAPI_FUNC(int) PyThread_tss_create(Py_tss_t *key)
    PyAPI_FUNC(void) PyThread_tss_delete(Py_tss_t *key)
    PyAPI_FUNC(int) PyThread_tss_set(Py_tss_t *key, void *value)
    PyAPI_FUNC(void *) PyThread_tss_get(Py_tss_t *key)

The specification also adds a few new features:

* A new type ``Py_tss_t``--an opaque type the definition of which may
  depend on the underlying TLS implementation.  It is defined::

      typedef struct {
          int _is_initialized;
          NATIVE_TSS_KEY_T _key;
      } Py_tss_t;

  where ``NATIVE_TSS_KEY_T`` is a macro whose value depends on the
  underlying native TLS implementation (e.g. ``pthread_key_t``).

* An initializer for ``Py_tss_t`` variables, ``Py_tss_NEEDS_INIT``.

* Three new functions::

      PyAPI_FUNC(Py_tss_t *) PyThread_tss_alloc(void)
      PyAPI_FUNC(void) PyThread_tss_free(Py_tss_t *key)
      PyAPI_FUNC(int) PyThread_tss_is_created(Py_tss_t *key)

  The first two are needed for dynamic (de-)allocation of a ``Py_tss_t``,
  particularly in extension modules built with ``Py_LIMITED_API``, where
  static allocation of this type is not possible due to its implementation
  being opaque at build time.  A value returned by ``PyThread_tss_alloc`` is
  in the same state as a value initialized with ``Py_tss_NEEDS_INIT``, or
  ``NULL`` in the case of dynamic allocation failure.  The behavior of
  ``PyThread_tss_free`` involves calling ``PyThread_tss_delete``
  preventively, or is a no-op if the value pointed to by the ``key``
  argument is ``NULL``.  ``PyThread_tss_is_created`` returns non-zero if the
  given ``Py_tss_t`` has been initialized (i.e. by ``PyThread_tss_create``).

The new TSS API does not provide functions which correspond to
``PyThread_delete_key_value`` and ``PyThread_ReInitTLS``, because these
functions were needed only for CPython's now defunct built-in TLS
implementation; that is the existing behavior of these functions is treated
as follows: ``PyThread_delete_key_value(key)`` is equivalent to
``PyThread_set_key_value(key, NULL)``, and ``PyThread_ReInitTLS()`` is a
no-op [8]_.

The new ``PyThread_tss_`` functions are almost exactly analogous to their
original counterparts with a few minor differences:  Whereas
``PyThread_create_key`` takes no arguments and returns a TLS key as an
``int``, ``PyThread_tss_create`` takes a ``Py_tss_t*`` as an argument and
returns an ``int`` status code. The behavior of ``PyThread_tss_create`` is
undefined if the value pointed to by the ``key`` argument is not initialized
by ``Py_tss_NEEDS_INIT``. The returned status code is zero on success
and non-zero on failure.  The meanings of non-zero status codes are not
otherwise defined by this specification.

Similarly the other ``PyThread_tss_`` functions are passed a ``Py_tss_t*``
whereas previously the key was passed by value.  This change is necessary, as
being an opaque type, the ``Py_tss_t`` type could hypothetically be almost
any size.  This is especially necessary for extension modules built with
``Py_LIMITED_API``, where the size of the type is not known.  Except for
``PyThread_tss_free``, the behaviors of ``PyThread_tss_`` are undefined if the
value pointed to by the ``key`` argument is ``NULL``.

Moreover, because of the use of ``Py_tss_t`` instead of ``int``, there are
behaviors in the new API which differ from the existing API with regard to
key creation and deletion.  ``PyThread_tss_create`` can be called repeatedly
on the same key--calling it on an already initialized key is a no-op and
immediately returns success. Similarly for calling ``PyThread_tss_delete``
with an uninitialized key.

The behavior of ``PyThread_tss_delete`` is defined to change the key's
initialization state to "uninitialized"--this allows, for example,
statically allocated keys to be reset to a sensible state when restarting
the CPython interpreter without terminating the process (e.g. embedding
Python in an application) [12]_.

The old ``PyThread_*_key*`` functions will be marked as deprecated in the
documentation, but will not generate runtime deprecation warnings.

Additionally, on platforms where ``sizeof(pthread_key_t) != sizeof(int)``,
``PyThread_create_key`` will return immediately with a failure status, and
the other TLS functions will all be no-ops on such platforms.

Comparison of API Specification
-------------------------------

=================  =============================  =============================
API                Thread Local Storage (TLS)     Thread Specific Storage (TSS)
=================  =============================  =============================
Version            Existing                       New
Key Type           ``int``                        ``Py_tss_t`` (opaque type)
Handle Native Key  cast to ``int``                conceal into internal field
Function Argument  ``int``                        ``Py_tss_t *``
Features           - create key                   - create key
                   - delete key                   - delete key
                   - set value                    - set value
                   - get value                    - get value
                   - delete value                 - (set ``NULL`` instead) [8]_
                   - reinitialize keys (after     - (unnecessary) [8]_
                     fork)
                                                  - dynamically (de-)allocate
                                                    key
                                                  - check key's initialization
                                                    state
Key Initializer    (``-1`` as key creation        ``Py_tss_NEEDS_INIT``
                   failure)
Requirement        native threads                 native threads
                   (since CPython 3.7 [9]_)
Restriction        No support for platforms       Unable to statically allocate
                   where native TLS key is        keys when ``Py_LIMITED_API``
                   defined in a way that cannot   is defined.
                   be safely cast to ``int``.
=================  =============================  =============================

Example
-------

With the proposed changes, a TSS key is initialized like::

    static Py_tss_t tss_key = Py_tss_NEEDS_INIT;
    if (PyThread_tss_create(&tss_key)) {
        /* ... handle key creation failure ... */
    }

The initialization state of the key can then be checked like::

    assert(PyThread_tss_is_created(&tss_key));

The rest of the API is used analogously to the old API::

    int the_value = 1;
    if (PyThread_tss_get(&tss_key) == NULL) {
        PyThread_tss_set(&tss_key, (void *)&the_value);
        assert(PyThread_tss_get(&tss_key) != NULL);
    }
    /* ... once done with the key ... */
    PyThread_tss_delete(&tss_key);
    assert(!PyThread_tss_is_created(&tss_key));

When ``Py_LIMITED_API`` is defined, a TSS key must be dynamically allocated::

    static Py_tss_t *ptr_key = PyThread_tss_alloc();
    if (ptr_key == NULL) {
        /* ... handle key allocation failure ... */
    }
    assert(!PyThread_tss_is_created(ptr_key));
    /* ... once done with the key ... */
    PyThread_tss_free(ptr_key);
    ptr_key = NULL;


Platform Support Changes
========================

A new "Native Thread Implementation" section will be added to :pep:`11` that
states:

* As of CPython 3.7, all platforms are required to provide a native thread
  implementation (such as pthreads or Windows) to implement the TSS
  API.  Any TSS API problems that occur in an implementation without native
  threads will be closed as "won't fix".


Motivation
==========

The primary problem at issue here is the type of the keys (``int``) used for
TLS values, as defined by the original PyThread TLS API.

The original TLS API was added to Python by GvR back in 1997, and at the
time the key used to represent a TLS value was an ``int``, and so it has
been to the time of writing.  This used CPython's own TLS implementation
which long remained unused, largely unchanged, in Python/thread.c.  Support
for implementation of the API on top of native thread implementations
(pthreads and Windows) was added much later, and the built-in implementation
has been deemed no longer necessary and has since been removed [9]_.

The problem with the choice of ``int`` to represent a TLS key, is that while
it was fine for CPython's own TLS implementation, and happens to be
compatible with Windows (which uses ``DWORD`` for the analogous data), it is
not compatible with the POSIX standard for the pthreads API, which defines
``pthread_key_t`` as an opaque type not further defined by the standard (as
with ``Py_tss_t`` described above) [14]_.  This leaves it up to the underlying
implementation how a ``pthread_key_t`` value is used to look up
thread-specific data.

This has not generally been a problem for Python's API, as it just happens
that on Linux ``pthread_key_t`` is defined as an ``unsigned int``, and so is
fully compatible with Python's TLS API--``pthread_key_t``'s created by
``pthread_create_key`` can be freely cast to ``int`` and back (well, not
exactly, even this has some limitations as pointed out by issue #22206).

However, as issue #25658 points out, there are at least some platforms
(namely Cygwin, CloudABI, but likely others as well) which have otherwise
modern and POSIX-compliant pthreads implementations, but are not compatible
with Python's API because their ``pthread_key_t`` is defined in a way that
cannot be safely cast to ``int``.  In fact, the possibility of running into
this problem was raised by MvL at the time pthreads TLS was added [2]_.

It could be argued that :pep:`11` makes specific requirements for supporting a
new, not otherwise officially-support platform (such as CloudABI), and that
the status of Cygwin support is currently dubious.  However, this creates a
very high barrier to supporting platforms that are otherwise Linux- and/or
POSIX-compatible and where CPython might otherwise "just work" except for
this one hurdle.  CPython itself imposes this implementation barrier by way
of an API that is not compatible with POSIX (and in fact makes invalid
assumptions about pthreads).


Rationale for Proposed Solution
===============================

The use of an opaque type (``Py_tss_t``) to key TLS values allows the API to
be compatible with all present (POSIX and Windows) and future (C11?) native
TLS implementations supported by CPython, as it allows the definition of
``Py_tss_t`` to depend on the underlying implementation.

Since the existing TLS API has been available in *the limited API* [13]_ for
some platforms (e.g. Linux), CPython makes an effort to provide the new TSS
API at that level likewise.  Note, however, that the ``Py_tss_t`` definition
becomes to be an opaque struct when ``Py_LIMITED_API`` is defined, because
exposing ``NATIVE_TSS_KEY_T`` as part of the limited API would prevent us
from switching native thread implementation without rebuilding extension
modules.

A new API must be introduced, rather than changing the function signatures of
the current API, in order to maintain backwards compatibility.  The new API
also more clearly groups together these related functions under a single name
prefix, ``PyThread_tss_``.  The "tss" in the name stands for "thread-specific
storage", and was influenced by the naming and design of the "tss" API that is
part of the C11 threads API [15]_.  However, this is in no way meant to imply
compatibility with or support for the C11 threads API, or signal any future
intention of supporting C11--it's just the influence for the naming and design.

The inclusion of the special initializer ``Py_tss_NEEDS_INIT`` is required
by the fact that not all native TLS implementations define a sentinel value
for uninitialized TLS keys.  For example, on Windows a TLS key is
represented by a ``DWORD`` (``unsigned int``) and its value must be treated
as opaque [3]_.  So there is no unsigned integer value that can be safely
used to represent an uninitialized TLS key on Windows.  Likewise, POSIX
does not specify a sentinel for an uninitialized ``pthread_key_t``, instead
relying on the ``pthread_once`` interface to ensure that a given TLS key is
initialized only once per-process.  Therefore, the ``Py_tss_t`` type
contains an explicit ``._is_initialized`` that can indicate the key's
initialization state independent of the underlying implementation.

Changing ``PyThread_create_key`` to immediately return a failure status on
systems using pthreads where ``sizeof(int) != sizeof(pthread_key_t)`` is
intended as a sanity check:  Currently, ``PyThread_create_key`` may report
initial success on such systems, but attempts to use the returned key are
likely to fail.  Although in practice this failure occurs earlier in the
interpreter initialization, it's better to fail immediately at the source of
problem (``PyThread_create_key``) rather than sometime later when use of an
invalid key is attempted.  In other words, this indicates clearly that the
old API is not supported on platforms where it cannot be used reliably, and
that no effort will be made to add such support.


Rejected Ideas
==============

* Do nothing: The status quo is fine because it works on Linux, and platforms
  wishing to be supported by CPython should follow the requirements of
  :pep:`11`.  As explained above, while this would be a fair argument if
  CPython were being to asked to make changes to support particular quirks
  or features of a specific platform, in this case it is a quirk of CPython
  that prevents it from being used to its full potential on otherwise
  POSIX-compliant platforms.  The fact that the current implementation
  happens to work on Linux is a happy accident, and there's no guarantee
  that this will never change.

* Affected platforms should just configure Python ``--without-threads``:
  this is no longer an option as the ``--without-threads`` option has
  been removed for Python 3.7 [16]_.

* Affected platforms should use CPython's built-in TLS implementation
  instead of a native TLS implementation: This is a more acceptable
  alternative to the previous idea, and in fact there had been a patch to do
  just that [4]_.  However, the built-in implementation being "slower and
  clunkier" in general than native implementations still needlessly hobbles
  performance on affected platforms.  At least one other module
  (``tracemalloc``) is also broken if Python is built without a native TLS
  implementation.  This idea also cannot be adopted because the built-in
  implementation has since been removed.

* Keep the existing API, but work around the issue by providing a mapping from
  ``pthread_key_t`` values to ``int`` values.  A couple attempts were made at
  this ([5]_, [6]_), but this injects needless complexity and overhead
  into performance-critical code on platforms that are not currently affected
  by this issue (such as Linux).  Even if use of this workaround were made
  conditional on platform compatibility, it introduces platform-specific code
  to maintain, and still has the problem of the previous rejected ideas of
  needlessly hobbling performance on affected platforms.


Implementation
==============

An initial version of a patch [7]_ is available on the bug tracker for this
issue.  Since the migration to GitHub, its development has continued in the
``pep539-tss-api`` feature branch [10]_ in Masayuki Yamamoto's fork of the
CPython repository on GitHub. A work-in-progress PR is available at [11]_.

This reference implementation covers not only the new API implementation
features, but also the client code updates needed to replace the existing
TLS API with the new TSS API.


Copyright
=========

This document has been placed in the public domain.


References and Footnotes
========================

.. [1] http://bugs.python.org/issue25658
.. [2] https://bugs.python.org/msg116292
.. [3] https://msdn.microsoft.com/en-us/library/windows/desktop/ms686801(v=vs.85).aspx
.. [4] http://bugs.python.org/file45548/configure-pthread_key_t.patch
.. [5] http://bugs.python.org/file44269/issue25658-1.patch
.. [6] http://bugs.python.org/file44303/key-constant-time.diff
.. [7] http://bugs.python.org/file46379/pythread-tss-3.patch
.. [8] https://bugs.python.org/msg298342
.. [9] http://bugs.python.org/issue30832
.. [10] https://github.com/python/cpython/compare/master...ma8ma:pep539-tss-api
.. [11] https://github.com/python/cpython/pull/1362
.. [12] https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx
.. [13] It is also called as "stable ABI" (:pep:`384`)
.. [14] http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html
.. [15] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#page=404
.. [16] https://bugs.python.org/issue31370