PEP: 627
Title: Recording installed projects
Author: Petr Viktorin <encukou@gmail.com>
BDFL-Delegate: Paul Moore <p.f.moore@gmail.com>
Discussions-To: https://discuss.python.org/t/pep-627/4126
Status: Final
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 15-Jul-2020
Resolution: https://discuss.python.org/t/pep-627/4126/42


Abstract
========

This PEP clarifies and updates :pep:`376` (Database of Installed Python
Distributions), rewriting it as an interoperability standard.
It moves the canonical location of the standard to the Python
Packaging Authority (PyPA) standards repository, and sets up guidelines
for changing it.

Two files in installed ``.dist-info`` directories are made optional:
``RECORD`` (which :pep:`376` lists as mandatory, but suggests it can be left out
for "system packages"), and ``INSTALLER``.


Motivation
==========

Python packaging is moving from relying on specific tools (Setuptools and pip)
toward an ecosystem of tools and tool-agnostic interoperability standards.

:pep:`376` is not written as an interoperability standard.
It describes implementation details of specific tools and libraries,
and is underspecified, leaving much room for implementation-defined behavior.

This is a proposal to “distill” the standard from :pep:`376`, clarify it,
and rewrite it to be tool-agnostic.

The aim of this PEP is to have a better standard, not necessarily a perfect one.
Some issues are left to later clarification.


Rationale Change
================

:pep:`376`'s rationale focuses on two problems:

* There are too many ways to install projects and this makes interoperation difficult.
* There is no API to get information on installed distributions.

The new document focuses only the on-disk format of information about
installed projects.
Providing API to install, uninstall or query this information is left to
be implemented by tools.


Standard and Changes Process
============================

The canonical standard for *Recording installed projects* (previously known as
*Database of Installed Python Distributions*) is the `documentation`_ at
``packaging.python.org``.
Any changes to the document (except trivial language or typography fixes) must
be made through the PEP process.

The document is normative (with examples to aid understanding).
PEPs that change it, such as this one, contain additional information that is
expected to get out of date, such as rationales and compatibility
considerations.

The proposed standard is submitted together with this PEP as a pull request to
``packaging.python.org``.

.. _documentation: https://packaging.python.org/specifications/recording-installed-packages/


Changes and their Rationale
===========================

Renaming to "Recording installed projects"
------------------------------------------

The standard is renamed from *Database of Installed Python Distributions*
to *Recording installed projects*.

While putting files in known locations on disk may be thought of as
a “database”, it's not what most people think about when they hear the term.
The PyPA links to :pep:`376` under the heading *Recording installed distributions*.

The PyPA glossary defines “Distribution” (or, “Distribution Package” to prevent
confusion with e.g. Linux distributions) as “A versioned archive file […]”.
Since there may be other ways to install Python code than from archive files,
the document uses “installed project” rather than “installed distribution”.


Removal of Implementation Details
---------------------------------

All tool- and library-specific details are removed.
The mechanisms of how a project is installed are also left out: the document
focuses on the end state.
One exception is a sketch of an uninstallation algorithm,
which is given to better explain the purpose of the ``RECORD`` file.

References to ``.egg-info`` and ``.egg``,
formats specific to ``setuptools`` and ``distutils``,
are left out.


Explicitly Allowing Additional Files
------------------------------------

The ``.dist-info`` directory is allowed to contain files not specified in
the spec.
The current tools already do this.

A note in the specification mentions files in the ``.dist-info`` directory of *wheels*.
Current tools copy these files to the installed ``.dist-info``—something
to keep in mind for further standardization efforts.


Clarifications in the ``RECORD`` File
-------------------------------------

The CSV dialect is specified to be the default of Python's ``csv`` module.
This resolves edge cases around handling double-quotes and line terminators
in file names.

The “base” of relative paths in ``RECORD`` is specified relative to the
``.dist-info`` directory, rather than tool-specific ``--install-lib`` and
``--prefix`` options.

Both *hash* and *size* fields are now optional (for any file, not just
``.pyc``, ``.pyo`` and ``RECORD``). Leavng them out is discouraged,
except for ``*.pyc`` and ``RECORD`` itself.
(Note that :pep:`376` is unclear on what was optional; when taken literally,
its text and examples contradict. Despite that, “both fields are optional“ is a
reasonable interpretation of :pep:`376`.
The alternative would be to mandate—rather than recommend—which files can be
recorded without hash and size, and to update that list over time as new use
cases come up.)

The new spec explicitly says that the ``RECORD`` file must now include *all*
files of the installed project (the exception for ``.pyc`` files remains).
Since tools use ``RECORD`` for uninstallation, incomplete file lists could
introduce orphaned files to users' environments.
On the other hand, this means that there is no way to record hashes of some
any files if the full list of files is unknown.

A sketch of an uninstallation algorithm is included to clarify the file's
primary purpose and contents.

Tools must not uninstall/remove projects that lack a ``RECORD`` file
(unless they have external information, such as in system package
managers of Linux distros).

On Windows, files in ``RECORD`` may be separated by either ``/`` or ``\``.
:pep:`376` was unclear on this: it mandates forward slashes in one place, but
shows backslackes in a Windows-specific example.



Optional ``RECORD`` File
------------------------

The ``RECORD`` file is made optional.
Not all tools can easily generate a list of installed files in a
Python-specific format.

Specifically, the ``RECORD`` file is unnecessary when projects are installed
by a Linux system packaging system, which has its own ways to keep track of
files, uninstall them or check their integrity.
Having to keep a ``RECORD`` file in sync with the disk and the system package
database would be unreasonably fragile, and no ``RECORD`` file is better
than one that does not correspond to reality.

(Full disclosure: The author of this PEP is an RPM packager active in the Fedora Linux distro.)


Optional ``INSTALLER`` File
---------------------------

The ``INSTALLER`` file is also made optional, and specified to be used for
informational purposes only.
It is still a single-line text file containing the name of the installer.

This file was originally added to distinguish projects installed by the Python
installer (``pip``) from ones installed by other package managers
(e.g. ``dnf``).
There were attempts to use this file to prevent ``pip`` from updating or
uninstalling packages it didn't install.

Our goal is supporting interoperating tools, and basing any action on
which tool happened to install a package runs counter to that goal.

Instead of relying on the installer name, tools should use feature detection.
The current document offers a crude way of making a project untouchable by
Python tooling: omitting ``RECORD`` file.

On the other hand, the installer name may be useful in hints to the user.

To align with this new purpose of the file, the new specification allows
any ASCII string in ``INSTALLER``, rather than a lowercase identifier.
It also suggests using the command-line command, if available.


The ``REQUESTED`` File: Removed from Spec
-----------------------------------------

The ``REQUESTED`` file is now considered a tool-specific extension.

Per :pep:`376`, ``REQUESTED`` was to be written when a project was installed
by direct user request, as opposed to automatically to satisfy dependencies
of another project. Projects without this marker file could be uninstalled
when no longer needed.

Despite the standard, many existing installers (including older versions of
``pip``) never write this file. There is no distinction between projects
that are “OK to remove when no longer needed” and ones simply installed by
a tool that ignores ``REQUESTED``. So, the file is currently not usable for its
intended purpose (unless a tool can use additional, non-standard information).


Clarifications
--------------

When possible, terms (such as ``name`` and ``version``) are qualified by
references to existing specs.


Deferred Ideas
==============

To limit the scope of this PEP, some improvements are explicitly left to
future PEPs:

* Encoding of the ``RECORD`` file
* Limiting or namespacing files that can appear in ``.dist-info``
* Marking the difference between projects installed directly by user request
  versus those installed to satisfy dependencies, so that the latter can be
  removed when no longer needed.


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.


..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: