PEP: 437
Title: A DSL for specifying signatures, annotations and argument converters
Version: $Revision$
Last-Modified: $Date$
Author: Stefan Krah <skrah@bytereef.org>
Status: Rejected
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Mar-2013
Python-Version: 3.4
Post-History:
Resolution: https://mail.python.org/pipermail/python-dev/2013-May/126117.html

Abstract
========

The Python C-API currently has no mechanism for specifying and auto-generating
function signatures, annotations or custom argument converters.

There are several possible approaches to the problem. Cython uses *cdef*
definitions in *.pyx* files to generate the required information. However,
CPython's C-API functions often require additional initialization and
cleanup snippets that would be hard to specify in a *cdef*.

:pep:`436` proposes a domain specific language (DSL) enclosed in C comments
that largely resembles a per-parameter configuration file. A preprocessor
reads the comment and emits an argument parsing function, docstrings and
a header for the function that utilizes the results of the parsing step.

The latter function is subsequently referred to as the *implementation
function*.


Rejection Notice
================

This PEP was rejected by Guido van Rossum at PyCon US 2013. However, several
of the specific issues raised by this PEP were taken into account when
designing the `second iteration of the PEP 436 DSL`_.


Rationale
=========

Opinions differ regarding the suitability of the :pep:`436` DSL in the context
of a C file. This PEP proposes an alternative DSL. The specific issues with
:pep:`436` that spurred the counter proposal will be explained in the final
section of this PEP.


Scope
=====

The PEP focuses exclusively on the DSL. Topics like the output locations of
docstrings or the generated code are outside the scope of this PEP.

It is however vital that the DSL is suitable for generating custom argument
parsers, a feature that is already implemented in Cython.  Therefore, one of
the goals of this PEP is to keep the DSL close to existing solutions, thus
facilitating a possible inclusion of the relevant parts of Cython into the
CPython source tree.


DSL overview
============

Type safety and annotations
---------------------------

A conversion from a Python to a C value is fully defined by the type of
the converter function.  The PyArg_Parse* family of functions accepts
custom converters in addition to the well-known default converters "i",
"f", etc.

This PEP views the default converters as abstract functions, regardless
of how they are actually implemented.


Include/converters.h
--------------------

Converter functions must be forward-declared. All converter functions
shall be entered into the file Include/converters.h. The file is read
by the preprocessor prior to translating .c files. This is an excerpt::

    /*[converter]
    ##### Default converters #####
    "s":  str                                -> const char *res;
    "s*": [str, bytes, bytearray, rw_buffer] -> Py_buffer &res;
    [...]
    "es#": str -> (const char *res_encoding, char **res, Py_ssize_t *res_length);
    [...]
    ##### Custom converters #####
    path_converter:           [str, bytes, int]  -> path_t &res;
    OS_STAT_DIR_FD_CONVERTER: [int, None]        -> int res;
    [converter_end]*/


Converters are specified by their name, Python input type(s) and C output
type(s).  Default converters must have quoted names, custom converters must
have regular names.  A Python type is given by its name. If a function accepts
multiple Python types, the set is written in list form.

Since the default converters may have multiple implicit return values,
the C output type(s) are written according to the following convention:

The main return value must be named *res*. This is a placeholder for
the actual variable name given later in the DSL. Additional implicit
return values must be prefixed by *res_*.

By default the variables are passed by value to the implementation function.
If the address should be passed instead, *res* must be prefixed with an
ampersand.


Additional declarations may be placed into .c files. Duplicate declarations
are allowed as long as the function types are identical.

It is encouraged to declare custom converter types a second time right
above the converter function definition. The preprocessor will then catch
any mismatch between the declarations.


In order to keep the converter complexity manageable, PY_SSIZE_T_CLEAN will
be deprecated and Py_ssize_t will be assumed for all length arguments.


TBD: Make a list of fantasy types like *rw_buffer*.


Function specifications
-----------------------

Keyword arguments
^^^^^^^^^^^^^^^^^

This example contains the definition of os.stat. The individual sections will
be explained in detail. Grammatically, the whole define block consists of a
function specification and an output section. The function specification in
turn consists of a declaration section, an optional C-declaration section and
an optional cleanup code section.  Sections within the function specification
are separated in yacc style by '%%'::

    /*[define posix_stat]
    def os.stat(path: path_converter, *, dir_fd: OS_STAT_DIR_FD_CONVERTER = None,
                follow_symlinks: "p" = True) -> os.stat_result: pass
    %%
    path_t path = PATH_T_INITIALIZE("stat", 0, 1);
    int dir_fd = DEFAULT_DIR_FD;
    int follow_symlinks = 1;
    %%
    path_cleanup(&path);
    [define_end]*/

    <literal C output>

    /*[define_output_end]*/


Define block
~~~~~~~~~~~~

The function specification block starts with a ``/*[define`` token, followed
by an optional C function name, followed by a right bracket. If the C function
name is not given, it is generated from the declaration name. In the example,
omitting the name *posix_stat* would result in a C function name of *os_stat*.


Declaration
~~~~~~~~~~~

The required declaration is (almost) a valid Python function definition. The
'def' keyword and the function body are redundant, but the author of this PEP
finds the definition more readable if they are present.

The function name may be a path instead of a plain identifier. Each argument
is annotated with the name of the converter function that will be applied to it.

Default values are given in the usual Python manner and may be any valid
Python expression.

The return value may be any Python expression. Usually it will be the name
of an object, but alternative return values could be specified in list form.


C-declarations
~~~~~~~~~~~~~~

This optional section contains C variable declarations. Since the converter
functions have been declared beforehand, the preprocessor can type-check
the declarations.


Cleanup
~~~~~~~

The optional cleanup section contains literal C code that will be inserted
unmodified after the implementation function.


Output
~~~~~~

The output section contains the code emitted by the preprocessor.


Positional-only arguments
^^^^^^^^^^^^^^^^^^^^^^^^^

Functions that do not take keyword arguments are indicated by the presence
of the *slash* special parameter::

    /*[define stat_float_times]
    def os.stat_float_times(/, newval: "i") -> os.stat_result: pass
    %%
    int newval = -1;
    [define_end]*/

The preprocessor translates this definition to a PyArg_ParseTuple() call.
All arguments to the right of the slash are optional arguments.


Left and right optional arguments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Some legacy functions contain optional arguments groups both to the left and
right of a central parameter. It is debatable whether a new tool should support
such functions.  For completeness' sake, this is the proposed syntax::

    /*[define]
    def curses.window.addch(y: "i", x: "i", ch: "O", attr: "l") -> None: pass
    where groups = [[ch], [ch, attr], [y, x, ch], [y, x, ch, attr]]
    [define_end]*/

Here *ch* is the central parameter, *attr* can optionally be added on the
right, and the group [y, x] can optionally be added on the left.

Essentially the rule is that all ordered combinations of the central
parameter and the optional groups must be possible such that no two
combinations have the same length.

This is concisely expressed by putting the central parameter first in
the list and subsequently adding the optional arguments groups to the
left and right.


Flexibility in formatting
=========================

If the above os.stat example is considered too compact, it can easily be
formatted this way::

    /*[define posix_stat]
    def os.stat(path: path_converter,
                *,
                dir_fd: OS_STAT_DIR_FD_CONVERTER = None,
                follow_symlinks: "p" = True)
    -> os.stat_result: pass
    %%
    path_t path = PATH_T_INITIALIZE("stat", 0, 1);
    int dir_fd = DEFAULT_DIR_FD;
    int follow_symlinks = 1;
    %%
    path_cleanup(&path);
    [define_end]*/

    <literal C output>

    /*[define_output_end]*/


Benefits of a compact notation
==============================

The advantages of a concise notation are especially obvious when a large
number of parameters is involved. The argument parsing part of
``_posixsubprocess.fork_exec`` is fully specified by this definition::

    /*[define subprocess_fork_exec]
    def _posixsubprocess.fork_exec(
        process_args: "O", executable_list: "O",
        close_fds: "p", py_fds_to_keep: "O",
        cwd_obj: "O", env_list: "O",
        p2cread: "i", p2cwrite: "i", c2pread: "i", c2pwrite: "i",
        errread: "i", errwrite: "i", errpipe_read: "i", errpipe_write: "i",
        restore_signals: "i", call_setsid: "i", preexec_fn: "i", /) -> int: pass
    [define_end]*/


Note that the *preprocess* tool currently emits a redundant C-declaration
section for this example, so the output is longer than necessary.


Easy validation of the definition
=================================

How can an inexperienced user validate a definition like os.stat? Simply
by changing os.stat to os_stat, defining missing converters and pasting
the definition into the Python interactive interpreter!

In fact, a converters.py module could be auto-generated from converters.h.


Reference implementation
========================

A reference implementation is available at `issue 16612`_. Since this PEP
was written under time constraints and the author is unfamiliar with the
PLY toolchain, the software is written in Standard ML and utilizes the
ml-yacc/ml-lex toolchain.

The grammar is conflict-free and available in ml-yacc readable BNF form.

Two tools are available:

* *printsemant* reads a converter header and a .c file and dumps
  the semantically checked parse tree to stdout.

* *preprocess* reads a converter header and a .c file and dumps
  the preprocessed .c file to stdout.


Known deficiencies:

* The Python 'test' expression is not semantically checked. The syntax
  however is checked since it is part of the grammar.

* The lexer does not handle triple quoted strings.

* C declarations are parsed in a primitive way. The final implementation
  should utilize 'declarator' and 'init-declarator' from the C grammar.

* The *preprocess* tool does not emit code for the left-and-right optional
  arguments case. The *printsemant* tool can deal with this case.

* Since the *preprocess* tool generates the output from the parse
  tree, the original indentation of the define block is lost.


Grammar
=======

  TBD: The grammar exists in ml-yacc readable form, but should probably be
  included here in EBNF notation.


Comparison with PEP 436
=======================

The author of this PEP has the following concerns about the DSL proposed
in :pep:`436`:

* The whitespace sensitive configuration file like syntax looks out
  of place in a C file.

* The structure of the function definition gets lost in the per-parameter
  specifications. Keywords like positional-only, required and keyword-only
  are scattered across too many different places.

  By contrast, in the alternative DSL the structure of the function
  definition can be understood at a single glance.

* The :pep:`436` DSL has 14 documented flags and at least one undocumented
  (allow_fd) flag. Figuring out which of the 2**15 possible combinations
  are valid places an unnecessary burden on the user.

  Experience with the :pep:`3118` buffer flags has shown that sorting out
  (and exhaustively testing!) valid combinations is an extremely tedious
  task. The :pep:`3118` flags are still not well understood by many people.

  By contrast, the alternative DSL has a central file Include/converters.h
  that can be quickly searched for the desired converter. Many of the
  converters are already known, perhaps even memorized by people (due
  to frequent use).

* The :pep:`436` DSL allows too much freedom. Types can apparently be omitted,
  the preprocessor accepts (and ignores) unknown keywords, sometimes adding
  white space after a docstring results in an assertion error.

  The alternative DSL on the other hand allows no such freedoms. Omitting
  converter or return value annotations is plainly a syntax error. The
  LALR(1) grammar is unambiguous and specified for the complete translation
  unit.


Copyright
=========

This document is licensed under the `Open Publication License`_.


References and Footnotes
========================

.. _issue 16612: http://bugs.python.org/issue16612

.. _Open Publication License: http://www.opencontent.org/openpub/

.. _second iteration of the PEP 436 DSL:
   http://hg.python.org/peps/rev/a2fa10b2424b


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: