PEP: 657
Title: Include Fine Grained Error Locations in Tracebacks
Version: $Revision$
Last-Modified: $Date$
Author: Pablo Galindo <pablogsal@python.org>,
        Batuhan Taskaya <batuhan@python.org>,
        Ammar Askar <ammar@ammaraskar.com>
Discussions-To: https://discuss.python.org/t/pep-657-include-fine-grained-error-locations-in-tracebacks/8629
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 08-May-2021
Python-Version: 3.11
Post-History:

Abstract
========

This PEP proposes adding a mapping from each bytecode instruction to the start
and end column offsets of the line that generated them as well as the end line
number. This data will be used to improve tracebacks displayed by the CPython
interpreter in order to improve the debugging experience. The PEP also proposes
adding APIs that allow other tools (such as coverage analysis tools, profilers,
tracers, debuggers) to consume this information from code objects.

Motivation
==========

The primary motivation for this PEP is to improve the feedback presented about
the location of errors to aid with debugging.

Python currently keeps a mapping of bytecode to line numbers from compilation.
The interpreter uses this mapping to point to the source line associated with
an error. While this line-level granularity for instructions is useful, a
single line of Python code can compile into dozens of bytecode operations
making it hard to track which part of the line caused the error.

Consider the following line of Python code::

    x['a']['b']['c']['d'] = 1

If any of the values in the dictionaries are ``None``, the error shown is::

    Traceback (most recent call last):
      File "test.py", line 2, in <module>
        x['a']['b']['c']['d'] = 1
    TypeError: 'NoneType' object is not subscriptable

From the traceback, it is impossible to determine which one of the dictionaries
had the ``None`` element that caused the error. Users often have to attach a
debugger or split up their expression to track down the problem.

However, if the interpreter had a mapping of bytecode to column offsets as well
as line numbers, it could helpfully display::

    Traceback (most recent call last):
      File "test.py", line 2, in <module>
        x['a']['b']['c']['d'] = 1
        ~~~~~~~~~~~^^^^^
    TypeError: 'NoneType' object is not subscriptable

indicating to the user that the object ``x['a']['b']`` must have been ``None``.
This highlighting will occur for every frame in the traceback. For instance, if
a similar error is part of a complex function call chain, the traceback would
display the code associated to the current instruction in every frame::

    Traceback (most recent call last):
      File "test.py", line 14, in <module>
        lel3(x)
        ^^^^^^^
      File "test.py", line 12, in lel3
        return lel2(x) / 23
               ^^^^^^^
      File "test.py", line 9, in lel2
        return 25 + lel(x) + lel(x)
                    ^^^^^^
      File "test.py", line 6, in lel
        return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
                             ~~~~~~~~~~~~~~~~^^^^^
    TypeError: 'NoneType' object is not subscriptable

This problem presents itself in the following situations.

* When passing down multiple objects to function calls while
  accessing the same attribute in them.
  For instance, this error::

    Traceback (most recent call last):
      File "test.py", line 19, in <module>
        foo(a.name, b.name, c.name)
    AttributeError: 'NoneType' object has no attribute 'name'

  With the improvements in this PEP this would show::

    Traceback (most recent call last):
      File "test.py", line 17, in <module>
        foo(a.name, b.name, c.name)
                    ^^^^^^
    AttributeError: 'NoneType' object has no attribute 'name'

* When dealing with lines with complex mathematical expressions,
  especially with libraries such as numpy where arithmetic
  operations can fail based on the arguments. For example: ::

    Traceback (most recent call last):
      File "test.py", line 1, in <module>
        x = (a + b) @ (c + d)
      ValueError: operands could not be broadcast together with shapes (1,2) (2,3)

  There is no clear indication as to which operation failed, was it the addition
  on the left, the right or the matrix multiplication in the middle? With this
  PEP the new error message would look like::

    Traceback (most recent call last):
      File "test.py", line 1, in <module>
        x = (a + b) @ (c + d)
                       ~~^~~
      ValueError: operands could not be broadcast together with shapes (1,2) (2,3)

  Giving a much clearer and easier to debug error message.


Debugging aside, this extra information would also be useful for code
coverage tools, enabling them to measure expression-level coverage instead of
just line-level coverage. For instance, given the following line: ::

    x = foo() if bar() else baz()

coverage, profile or state analysis tools will highlight the full line in both
branches, making it impossible to differentiate what branch was taken. This is
a known problem in pycoverage_.

Similar efforts to this PEP have taken place in other languages such as Java in
the form of JEP358_. ``NullPointerExceptions`` in Java were similarly nebulous when
it came to lines with complicated expressions. A ``NullPointerException`` would
provide very little aid in finding the root cause of an error. The
implementation for JEP358 is fairly complex, requiring walking back through the
bytecode by using a control flow graph analyzer and decompilation techniques to
recover the source code that led to the null pointer. Although the complexity
of this solution is high and requires maintenance for the decompiler every time
Java bytecode is changed, this improvement was deemed to be worth it for the
extra information provided for *just one exception type*.


Rationale
=========

In order to identify the range of source code being executed when exceptions
are raised, this proposal requires adding new data for every bytecode
instruction. This will have an impact on the size of ``pyc`` files on disk and
the size of code objects in memory. The authors of this proposal have chosen
the data types in a way that tries to minimize this impact. The proposed
overhead is storing two ``uint8_t`` (one for the start offset and one for the
end offset) and the end line information for every bytecode instruction (in
the same encoded fashion as the start line is stored currently).

As an illustrative example to gauge the impact of this change, we have
calculated that including the start and end offsets will increase the size of
the standard library’s pyc files by 22% (6MB) from 28.4MB to 34.7MB. The
overhead in memory usage will be the same (assuming the *full standard library*
is loaded into the same program). We believe that this is a very acceptable
number since the order of magnitude of the overhead is very small, especially
considering the storage size and memory capabilities of modern computers.
Additionally, in general the memory size of a Python program is not dominated
by code objects. To check this assumption we have executed the test suite of
several popular PyPI projects (including NumPy, pytest, Django and Cython) as
well as several applications (Black, pylint, mypy executed over either mypy or
the standard library) and we found that code objects represent normally 3-6% of
the average memory size of the program.

We understand that the extra cost of this information may not be acceptable for
some users, so we propose an opt-out mechanism which will cause generated code
objects to not have the extra information while also allowing pyc files to not
include the extra information.


Specification
=============

In order to have enough information to correctly resolve the location
within a given line where an error was raised, a map linking bytecode
instructions to column offsets (start and end offset) and end line numbers
is needed. This is similar in fashion to how line numbers are currently linked
to bytecode instructions.

The following changes will be performed as part of the implementation of
this PEP:

* The offset information will be exposed to Python via a new attribute in the
  code object class called ``co_positions`` that will return a sequence of
  four-element tuples containing the full location of every instruction
  (including start line, end line, start column offset and end column offset)
  or ``None`` if the code object was created without the offset information.
* One new C-API function: ::

    int PyCode_Addr2Location(
        PyCodeObject *co, int addrq,
        int *start_line, int *start_column,
        int *end_line, int *end_column)

  will be added so the end line, the start column offsets and the end column
  offset can be obtained given the index of a bytecode instruction. This
  function will set the values to 0 if the information is not available.

The internal storage, compression and encoding of the information is left as an
implementation detail and can be changed at any point as long as the public API
remains unchanged.

Offset semantics
^^^^^^^^^^^^^^^^

These offsets are propagated by the compiler from the ones stored currently in
all AST nodes. The output of the public APIs (``co_positions`` and ``PyCode_Addr2Location``)
that deal with these attributes use 0-indexed offsets (just like the AST nodes), but the underlying
implementation is free to represent the actual data in whatever form they choose to be most efficient.
The error code regarding information not available is ``None`` for the ``co_positions()`` API,
and ``-1`` for the ``PyCode_Addr2Location`` API. The availability of the information highly depends
on whether the offsets fall under the range, as well as the runtime flags for the interpreter
configuration.

The AST nodes use ``int`` types to store these values. The current implementation, however,
utilizes ``uint8_t`` types as an implementation detail to minimize storage impact. This decision
allows offsets to go from 0 to 255, while offsets bigger than these values will be treated as
missing (returning ``-1`` on the ``PyCode_Addr2Location`` and ``None`` API in the ``co_positions()`` API).

As specified previously, the underlying storage of the offsets should be
considered an implementation detail, as the public APIs to obtain this values
will return either C ``int`` types or Python ``int`` objects, which allows to
implement better compression/encoding in the future if bigger ranges would need
to be supported.  This PEP proposes to start with this simpler version and
defer improvements to future work.

Displaying tracebacks
^^^^^^^^^^^^^^^^^^^^^

When displaying tracebacks, the default exception hook will be modified to
query this information from the code objects and use it to display a sequence
of carets for every displayed line in the traceback if the information is
available. For instance::

      File "test.py", line 6, in lel
        return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
                             ~~~~~~~~~~~~~~~~^^^^^
    TypeError: 'NoneType' object is not subscriptable

When displaying tracebacks, instruction offsets will be taken from the
traceback objects. This makes highlighting exceptions that are re-raised work
naturally without the need to store the new information in the stack. For
example, for this code::

    def foo(x):
        1 + 1/0 + 2

    def bar(x):
        try:
            1 + foo(x) + foo(x)
        except Exception as e:
            raise ValueError("oh no!") from e

    bar(bar(bar(2)))

The printed traceback would look like this::

    Traceback (most recent call last):
      File "test.py", line 6, in bar
        1 + foo(x) + foo(x)
            ^^^^^^
      File "test.py", line 2, in foo
        1 + 1/0 + 2
            ~^~
    ZeroDivisionError: division by zero

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last):
      File "test.py", line 10, in <module>
        bar(bar(bar(2)))
                ^^^^^^
      File "test.py", line 8, in bar
        raise ValueError("oh no!") from e
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ValueError: oh no

While this code::

    def foo(x):
        1 + 1/0 + 2
    def bar(x):
        try:
            1 + foo(x) + foo(x)
        except Exception:
            raise
    bar(bar(bar(2)))

Will be displayed as::

    Traceback (most recent call last):
      File "test.py", line 10, in <module>
        bar(bar(bar(2)))
                ^^^^^^
      File "test.py", line 6, in bar
        1 + foo(x) + foo(x)
            ^^^^^^
      File "test.py", line 2, in foo
        1 + 1/0 + 2
            ~^~
    ZeroDivisionError: division by zero

Maintaining the current behavior, only a single line will be displayed
in tracebacks. For instructions that span multiple lines (the end offset
and the start offset belong to different lines), the end line number must
be inspected to know if the end offset applies to the same line as the
starting offset.

Opt-out mechanism
^^^^^^^^^^^^^^^^^

To offer an opt-out mechanism for those users that care about the
storage and memory overhead and to allow third party tools and other
programs that are currently parsing tracebacks to catch up the following
methods will be provided to deactivate this feature:

* A new environment variable: ``PYTHONNODEBUGRANGES``.
* A new command line option for the dev mode: ``python -Xno_debug_ranges``.

If any of these methods are used, the Python compiler will **not** populate
code objects with the new information (``None`` will be used instead) and any
unmarshalled code objects that contain the extra information will have it stripped
away and replaced with ``None``). Additionally, the traceback machinery will not
show the extended location information even if the information was present.
This method allows users to:

* Create smaller ``pyc`` files by using one of the two methods when said files
  are created.
* Don't load the extra information from ``pyc`` files if those were created with
  the extra information in the first place.
* Deactivate the extra information when displaying tracebacks (the caret characters
  indicating the location of the error).

Doing this has a **very small** performance hit as the interpreter state needs
to be fetched when code objects are created to look up the configuration.
Creating code objects is not a performance sensitive operation so this should
not be a concern.

Backwards Compatibility
=======================

The change is fully backwards compatible.


Reference Implementation
========================

A reference implementation can be found in the implementation_ fork.

Rejected Ideas
==============

Use a single caret instead of a range
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It has been proposed to use a single caret instead of highlighting the full
range when reporting errors as a way to simplify the feature. We have decided
to not go this route for the following reasons:

* Deriving the location of the caret is not straightforward using the current
  layout of the AST. This is because the AST nodes only record the start and end
  line numbers as well as the start and end column offsets. As the AST nodes do
  not preserve the original tokens (by design) deriving the exact location of some
  tokens is not possible without extra re-parsing. For instance, currently binary
  operators have nodes for the operands but the type of the operator is stored
  in an enumeration so its location cannot be derived from the node (this is just
  an example of how this problem manifest, and not the only one).
* Deriving the ranges from AST nodes greatly simplifies the implementation and reduces
  a lot the maintenance cost and the possibilities of errors. This is because using
  the ranges is always possible to do generically for any AST node, while any other
  custom information would need to be extracted differently from different types of
  nodes. Given how error-prone getting the locations manually was when this used to
  be a manual process when generating the AST, we believe that a generic solution is
  a very important property to pursue.
* Storing the information to highlight a single caret will be very limiting for tools
  such as coverage tools and profilers as well as for tools like IPython and IDEs that
  want to make use of this new feature. As `this message <https://discuss.python.org/t/pep-657-include-fine-grained-error-locations-in-tracebacks/8629/2?u=pablogsal>`_ from the author of "friendly-traceback"
  mentions, the reason is that without the full range (including end lines) these tools
  will find very difficult to highlight correctly the relevant source code. For instance,
  for this code::

    something = foo(a,b,c) if bar(a,b,c) else other(b,c,d)

  tools (such as coverage reporters) want to be able to highlight the totality of the call
  that is covered by the executed bytecode (let's say ``foo(a,b,c)``) and not just a single
  character.  Even if is technically possible to re-parse and re-tokenize the source code
  to re-construct the information, it is not possible to do this reliably and would
  result in a much worse user experience.
* Many users have reported that a single caret is much harder to read than a full range,
  and this motivated using ranges to highlight syntax errors, which was very well received.
  Additionally, it has been noted that users with vision problems can identify the ranges
  much easily than a single caret character, which we believe is a great advantage of
  using ranges.

Have a configure flag to opt out
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Having a configure flag to opt out of the overhead even when executing Python
in non-optimized mode may sound desirable, but it may cause problems when
reading pyc files that were created with a version of the interpreter that was
not compiled with the flag activated. This can lead to crashes that would be
very difficult to debug for regular users and will make different pyc files
incompatible between each other. As this pyc could be shipped as part of
libraries or applications without the original source, it is also not always
possible to force recompilation of said pyc files. For these reasons we have
decided to use the -O flag to opt-out of this behaviour. 

Lazy loading of column information
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
One potential solution to reduce the memory usage of this feature is to not
load the column information from the pyc file when code is imported. Only if an
uncaught exception bubbles up or if a call to the C-API functions is made will
the column information be loaded from the pyc file. This is similar to how we
only read source lines to display them in the traceback when an exception
bubbles up. While this would indeed lower memory usage, it also results in a
far more complex implementation requiring changes to the importing machinery to
selectively ignore a part of the code object. We consider this an interesting
avenue to explore but ultimately we think is out of the scope for this particular
PEP. It also means that column information will not be available if the user is
not using pyc files or for code objects created dynamically at runtime.

Implement compression
^^^^^^^^^^^^^^^^^^^^^
Although it would be possible to implement some form of compression over the
pyc files and the new data in code objects, we believe that this is out of the
scope of this proposal due to its larger impact (in the case of pyc files) and
the fact that we expect column offsets to not compress well due to the lack of
patterns in them (in case of the new data in code objects).

Acknowledgments
===============
Thanks to Carl Friedrich Bolz-Tereick for showing an initial prototype of this
idea for the Pypy interpreter and for the helpful discussion.


References
==========

.. _JEP358: https://openjdk.java.net/jeps/358
.. _implementation: https://github.com/colnotab/cpython/tree/bpo-43950
.. _pycoverage: https://github.com/nedbat/coveragepy/issues/509

Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.

..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: