PEP: 3123 Title: Making PyObject_HEAD conform to standard C Version: $Revision$ Last-Modified: $Date$ Author: Martin von Löwis <martin@v.loewis.de> Status: Final Type: Standards Track Content-Type: text/x-rst Created: 27-Apr-2007 Python-Version: 3.0 Post-History: Abstract ======== Python currently relies on undefined C behavior, with its usage of ``PyObject_HEAD``. This PEP proposes to change that into standard C. Rationale ========= Standard C defines that an object must be accessed only through a pointer of its type, and that all other accesses are undefined behavior, with a few exceptions. In particular, the following code has undefined behavior:: struct FooObject{ PyObject_HEAD int data; }; PyObject *foo(struct FooObject*f){ return (PyObject*)f; } int bar(){ struct FooObject *f = malloc(sizeof(struct FooObject)); struct PyObject *o = foo(f); f->ob_refcnt = 0; o->ob_refcnt = 1; return f->ob_refcnt; } The problem here is that the storage is both accessed as if it where struct ``PyObject``, and as struct ``FooObject``. Historically, compilers did not have any problems with this code. However, modern compilers use that clause as an optimization opportunity, finding that ``f->ob_refcnt`` and ``o->ob_refcnt`` cannot possibly refer to the same memory, and that therefore the function should return 0, without having to fetch the value of ob_refcnt at all in the return statement. For GCC, Python now uses ``-fno-strict-aliasing`` to work around that problem; with other compilers, it may just see undefined behavior. Even with GCC, using ``-fno-strict-aliasing`` may pessimize the generated code unnecessarily. Specification ============= Standard C has one specific exception to its aliasing rules precisely designed to support the case of Python: a value of a struct type may also be accessed through a pointer to the first field. E.g. if a struct starts with an ``int``, the ``struct *`` may also be cast to an ``int *``, allowing to write int values into the first field. For Python, ``PyObject_HEAD`` and ``PyObject_VAR_HEAD`` will be changed to not list all fields anymore, but list a single field of type ``PyObject``/``PyVarObject``:: typedef struct _object { _PyObject_HEAD_EXTRA Py_ssize_t ob_refcnt; struct _typeobject *ob_type; } PyObject; typedef struct { PyObject ob_base; Py_ssize_t ob_size; } PyVarObject; #define PyObject_HEAD PyObject ob_base; #define PyObject_VAR_HEAD PyVarObject ob_base; Types defined as fixed-size structure will then include PyObject as its first field, PyVarObject for variable-sized objects. E.g.:: typedef struct { PyObject ob_base; PyObject *start, *stop, *step; } PySliceObject; typedef struct { PyVarObject ob_base; PyObject **ob_item; Py_ssize_t allocated; } PyListObject; The above definitions of ``PyObject_HEAD`` are normative, so extension authors MAY either use the macro, or put the ``ob_base`` field explicitly into their structs. As a convention, the base field SHOULD be called ob_base. However, all accesses to ob_refcnt and ob_type MUST cast the object pointer to PyObject* (unless the pointer is already known to have that type), and SHOULD use the respective accessor macros. To simplify access to ob_type, ob_refcnt, and ob_size, macros:: #define Py_TYPE(o) (((PyObject*)(o))->ob_type) #define Py_REFCNT(o) (((PyObject*)(o))->ob_refcnt) #define Py_SIZE(o) (((PyVarObject*)(o))->ob_size) are added. E.g. the code blocks :: #define PyList_CheckExact(op) ((op)->ob_type == &PyList_Type) return func->ob_type->tp_name; needs to be changed to:: #define PyList_CheckExact(op) (Py_TYPE(op) == &PyList_Type) return Py_TYPE(func)->tp_name; For initialization of type objects, the current sequence :: PyObject_HEAD_INIT(NULL) 0, /* ob_size */ becomes incorrect, and must be replaced with :: PyVarObject_HEAD_INIT(NULL, 0) Compatibility with Python 2.6 ============================= To support modules that compile with both Python 2.6 and Python 3.0, the ``Py_*`` macros are added to Python 2.6. The macros ``Py_INCREF`` and ``Py_DECREF`` will be changed to cast their argument to ``PyObject *``, so that module authors can also explicitly declare the ``ob_base`` field in modules designed for Python 2.6. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: