PEP: 376
Title: Database of Installed Python Distributions
Version: $Revision$
Last-Modified: $Date$
Author: Tarek Ziadé <tarek@ziade.org>
Status: Final
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 22-Feb-2009
Python-Version: 2.7, 3.2
Post-History:

Abstract
========

The goal of this PEP is to provide a standard infrastructure to manage
project distributions installed on a system, so all tools that are
installing or removing projects are interoperable.

To achieve this goal, the PEP proposes a new format to describe installed
distributions on a system. It also describes a reference implementation
for the standard library.

In the past an attempt was made to create an installation database (see :pep:`262`
).

Combined with :pep:`345`, the current proposal supersedes :pep:`262`.

Note: the implementation plan didn't go as expected, so it should be
considered informative only for this PEP.


Rationale
=========

There are two problems right now in the way distributions are installed in
Python:

- There are too many ways to do it and this makes interoperation difficult.
- There is no API to get information on installed distributions.

How distributions are installed
-------------------------------

Right now, when a distribution is installed in Python, every element can
be installed in a different directory.

For instance, ``Distutils`` installs the pure Python code in the ``purelib``
directory, which is ``lib/python2.6/site-packages`` for unix-like systems and
Mac OS X, or ``Lib\site-packages`` under Python's installation directory for
Windows.

Additionally, the ``install_egg_info`` subcommand of the Distutils ``install``
command adds an ``.egg-info`` file for the project into the ``purelib``
directory.

For example, for the ``docutils`` distribution, which contains one package an
extra module and executable scripts, three elements are installed in
``site-packages``:

- ``docutils``: The ``docutils`` package.
- ``roman.py``: An extra module used by ``docutils``.
- ``docutils-0.5-py2.6.egg-info``: A file containing the distribution metadata
  as described in :pep:`314`. This file corresponds to the file
  called ``PKG-INFO``, built by the ``sdist`` command.

Some executable scripts, such as ``rst2html.py``, are also added in the
``bin`` directory of the Python installation.

Another project called ``setuptools`` [#setuptools]_ has two other formats
to install distributions, called ``EggFormats`` [#eggformats]_:

- a self-contained ``.egg`` directory, that contains all the distribution files
  and the distribution metadata in a file called ``PKG-INFO`` in a subdirectory
  called ``EGG-INFO``. ``setuptools`` creates other files in that directory that can
  be considered as complementary metadata.

- an ``.egg-info`` directory installed in ``site-packages``, that contains the same
  files ``EGG-INFO`` has in the ``.egg`` format.

The first format is automatically used when you install a distribution that
uses the ``setuptools.setup`` function in its setup.py file, instead of
the ``distutils.core.setup`` one.

``setuptools`` also add a reference to the distribution into an
``easy-install.pth`` file.

Last, the ``setuptools`` project provides an executable script called
``easy_install`` [#easyinstall]_ that installs all distributions, including
distutils-based ones in self-contained ``.egg`` directories.

If you want to have standalone ``.egg-info`` directories for your distributions,
e.g. the second ``setuptools`` format, you have to force it when you work
with a setuptools-based distribution or with the ``easy_install`` script.
You can force it by using the ``--single-version-externally-managed`` option
**or** the ``--root`` option. This will make the ``setuptools`` project install
the project like distutils does.

This option is used by :

- the ``pip`` [#pip]_ installer
- the Fedora packagers [#fedora]_.
- the Debian packagers [#debian]_.

Uninstall information
---------------------

Distutils doesn't provide an ``uninstall`` command. If you want to uninstall
a distribution, you have to be a power user and remove the various elements
that were installed, and then look over the ``.pth`` file to clean them if
necessary.

And the process differs depending on the tools you have used to install the
distribution and if the distribution's ``setup.py`` uses Distutils or
Setuptools.

Under some circumstances, you might not be able to know for sure that you
have removed everything, or that you didn't break another distribution by
removing a file that is shared among several distributions.

But there's a common behavior: when you install a distribution, files are
copied in your system. And it's possible to keep track of these files for
later removal.

Moreover, the Pip project has gained an ``uninstall`` feature lately. It
records all installed files, using the ``record`` option of the ``install``
command.

What this PEP proposes
----------------------

To address those issues, this PEP proposes a few changes:

- A new ``.dist-info`` structure using a directory, inspired on one format of
  the ``EggFormats`` standard from ``setuptools``.
- New APIs in ``pkgutil`` to be able to query the information of installed
  distributions.
- An uninstall function and an uninstall script in Distutils.


One .dist-info directory per installed distribution
===================================================

This PEP proposes an installation format inspired by one of the options in the
``EggFormats`` standard, the one that uses a distinct directory located in the
site-packages directory.

This distinct directory is named as follows::

    name + '-' + version + '.dist-info'

This ``.dist-info`` directory can contain these files:

- ``METADATA``: contains metadata, as described in :pep:`345`, :pep:`314` and :pep:`241`.
- ``RECORD``: records the list of installed files
- ``INSTALLER``: records the name of the tool used to install the project
- ``REQUESTED``: the presence of this file indicates that the project
  installation was explicitly requested (i.e., not installed as a dependency).

The METADATA, RECORD and INSTALLER files are mandatory, while REQUESTED may
be missing.

This proposal will not impact Python itself because the metadata files are not
used anywhere yet in the standard library besides Distutils.

It will impact the ``setuptools`` and ``pip`` projects but, given the fact that
they already work with a directory that contains a ``PKG-INFO`` file, the change
will have no deep consequences.


RECORD
------

A ``RECORD`` file is added inside the ``.dist-info`` directory at installation
time when installing a source distribution using the ``install`` command.
Notice that when installing a binary distribution created with ``bdist`` command
or a ``bdist``-based command, the ``RECORD`` file will be installed as well since
these commands use the ``install`` command to create binary distributions.

The ``RECORD`` file holds the list of installed files. These correspond
to the files listed by the ``record`` option of the ``install`` command, and will
be generated by default. This allows the implementation of an uninstallation
feature, as explained later in this PEP. The ``install`` command also provides
an option to prevent the ``RECORD`` file from being written and this option
should be used when creating system packages.

Third-party installation tools also should not overwrite or delete files
that are not in a RECORD file without prompting or warning.

This RECORD file is inspired from :pep:`262` FILES.

The ``RECORD`` file is a CSV file, composed of records, one line per
installed file. The ``csv`` module is used to read the file, with
these options:

- field delimiter : ``,``
- quoting char :  ``"``.
- line terminator : ``os.linesep`` (so ``\r\n`` or ``\n``)

When a distribution is installed, files can be installed under:

- the **base location**: path defined by the ``--install-lib`` option,
  which defaults to the site-packages directory.

- the **installation prefix**: path defined by the ``--prefix`` option, which
  defaults to ``sys.prefix``.

- any other path on the system.


Each record is composed of three elements:

- the file's **path**

  - a '/'-separated path, relative to the **base location**, if the file is
    under the **base location**.

  - a '/'-separated path, relative to the **base location**, if the file
    is under the  **installation prefix** AND if the **base location** is a
    subpath of the **installation prefix**.

  - an absolute path, using the local platform separator

- a hash of the file's contents.
  Notice that ``pyc`` and ``pyo`` generated files don't have any hash because
  they are automatically produced from ``py`` files. So checking the hash
  of the corresponding ``py`` file is enough to decide if the file and
  its associated ``pyc`` or ``pyo`` files have changed.

  The hash is either the empty string or the hash algorithm as named in
  ``hashlib.algorithms_guaranteed``, followed by the equals character
  ``=``, followed by the urlsafe-base64-nopad encoding of the digest
  (``base64.urlsafe_b64encode(digest)`` with trailing ``=`` removed).

- the file's size in bytes

The ``csv`` module is used to generate this file, so the field separator is
",". Any "," character found within a field is escaped automatically by
``csv``.

When the file is read, the ``U`` option is used so the universal newline
support (see :pep:`278`) is activated, avoiding any trouble
reading a file produced on a platform that uses a different new line
terminator.

Here's an example of a RECORD file (extract)::

    lib/python2.6/site-packages/docutils/__init__.py,md5=nWt-Dge1eug4iAgqLS_uWg,9544
    lib/python2.6/site-packages/docutils/__init__.pyc,,
    lib/python2.6/site-packages/docutils/core.py,md5=X90C_JLIcC78PL74iuhPnA,66188
    lib/python2.6/site-packages/docutils/core.pyc,,
    lib/python2.6/site-packages/roman.py,md5=7YhfNczihNjOY0FXlupwBg,234
    lib/python2.6/site-packages/roman.pyc,,
    /usr/local/bin/rst2html.py,md5=g22D3amDLJP-FhBzCi7EvA,234
    /usr/local/bin/rst2html.pyc,,
    python2.6/site-packages/docutils-0.5.dist-info/METADATA,md5=ovJyUNzXdArGfmVyb0onyA,195
    lib/python2.6/site-packages/docutils-0.5.dist-info/RECORD,,

Notice that the ``RECORD`` file can't contain a hash of itself and is just mentioned here

A project that installs a ``config.ini`` file in ``/etc/myapp`` will be added like this::

    /etc/myapp/config.ini,md5=gLfd6IANquzGLhOkW4Mfgg,9544

For a windows platform, the drive letter is added for the absolute paths,
so a file that is copied in c:\MyApp\ will be::

    c:\etc\myapp\config.ini,md5=gLfd6IANquzGLhOkW4Mfgg,9544


INSTALLER
---------

The ``install`` command has a new option called ``installer``. This option
is the name of the tool used to invoke the installation. It's a normalized
lower-case string matching ``[a-z0-9_\-\.]``.

    $ python setup.py install --installer=pkg-system

It defaults to ``distutils`` if not provided.

When a distribution is installed, the INSTALLER file is generated in the
``.dist-info`` directory with this value, to keep track of **who** installed the
distribution. The file is a single-line text file.


REQUESTED
---------

Some install tools automatically detect unfulfilled dependencies and
install them. In these cases, it is useful to track which
distributions were installed purely as a dependency, so if their
dependent distribution is later uninstalled, the user can be alerted
of the orphaned dependency.

If a distribution is installed by direct user request (the usual
case), a file REQUESTED is added to the .dist-info directory of the
installed distribution. The REQUESTED file may be empty, or may
contain a marker comment line beginning with the "#" character.

If an install tool installs a distribution automatically, as a
dependency of another distribution, the REQUESTED file should not be
created.

The ``install`` command of distutils by default creates the REQUESTED
file. It accepts ``--requested`` and ``--no-requested`` options to explicitly
specify whether the file is created.

If a distribution that was already installed on the system as a dependency
is later installed by name, the distutils ``install`` command will
create the REQUESTED file in the .dist-info directory of the existing
installation.


Implementation details
======================

Note: this section is non-normative.  In the end, this PEP was
implemented by third-party libraries and tools, not the standard
library.

New functions and classes in pkgutil
------------------------------------

To use the ``.dist-info`` directory content, we need to add in the standard
library a set of APIs. The best place to put these APIs is ``pkgutil``.

Functions
~~~~~~~~~

The new functions added in the ``pkgutil`` module are :

- ``distinfo_dirname(name, version)`` -> directory name

    ``name`` is converted to a standard distribution name by replacing any
    runs of non-alphanumeric characters with a single '-'.

    ``version`` is converted to a standard version string. Spaces become
    dots, and all other non-alphanumeric characters (except dots) become
    dashes, with runs of multiple dashes condensed to a single dash.

    Both attributes are then converted into their filename-escaped form,
    i.e. any '-' characters are replaced with '_' other than the one in
    'dist-info' and the one separating the name from the version number.

- ``get_distributions()`` -> iterator of ``Distribution`` instances.

  Provides an iterator that looks for ``.dist-info`` directories in
  ``sys.path`` and returns ``Distribution`` instances for
  each one of them.

- ``get_distribution(name)`` -> ``Distribution`` or None.

- ``obsoletes_distribution(name, version=None)`` -> iterator of ``Distribution``
  instances.

  Iterates over all distributions to find which distributions *obsolete*
  ``name``. If a ``version`` is provided, it will be used to filter the results.

- ``provides_distribution(name, version=None)`` -> iterator of ``Distribution``
  instances.

  Iterates over all distributions to find which distributions *provide*
  ``name``. If a ``version`` is provided, it will be used to filter the results.
  Scans all elements in ``sys.path`` and looks for all directories ending with
  ``.dist-info``. Returns a ``Distribution`` corresponding to the
  ``.dist-info`` directory that contains a METADATA that matches ``name``
  for the ``name`` metadata.

  This function only returns the first result founded, since no more than one
  values are expected. If the directory is not found, returns None.

- ``get_file_users(path)`` -> iterator of ``Distribution`` instances.

  Iterates over all distributions to find out which distributions uses ``path``.
  ``path`` can be a local absolute path or a relative '/'-separated path.

  A local absolute path is an absolute path in which occurrences of '/'
  have been replaced by the system separator given by ``os.sep``.


Distribution class
~~~~~~~~~~~~~~~~~~

A new class called ``Distribution`` is created with the path of the
``.dist-info`` directory provided to the constructor. It reads the metadata
contained in ``METADATA`` when it is instantiated.

``Distribution(path)`` -> instance

  Creates a ``Distribution`` instance for the given ``path``.

``Distribution`` provides the following attributes:

- ``name``: The name of the distribution.

- ``metadata``: A ``DistributionMetadata`` instance loaded with the
  distribution's METADATA file.

- ``requested``: A boolean that indicates whether the REQUESTED
  metadata file is present (in other words, whether the distribution was
  installed by user request).

And following methods:

- ``get_installed_files(local=False)`` -> iterator of (path, hash, size)

  Iterates over the ``RECORD`` entries and return a tuple ``(path, hash, size)``
  for each line. If ``local`` is ``True``, the path is transformed into a
  local absolute path. Otherwise the raw value from ``RECORD`` is returned.

  A local absolute path is an absolute path in which occurrences of '/'
  have been replaced by the system separator given by ``os.sep``.

- ``uses(path)`` -> Boolean

  Returns ``True`` if ``path`` is listed in ``RECORD``. ``path``
  can be a local absolute path or a relative '/'-separated path.

- ``get_distinfo_file(path, binary=False)`` -> file object

   Returns a file located under the ``.dist-info`` directory.

   Returns a ``file`` instance for the file pointed by ``path``.

   ``path`` has to be a '/'-separated path relative to the ``.dist-info``
   directory or an absolute path.

   If ``path`` is an absolute path and doesn't start with the ``.dist-info``
   directory path, a ``DistutilsError`` is raised.

   If ``binary`` is ``True``, opens the file in read-only binary mode (``rb``),
   otherwise opens it in read-only mode (``r``).

- ``get_distinfo_files(local=False)`` -> iterator of paths

  Iterates over the ``RECORD`` entries and returns paths for each line if the path
  is pointing to a file located in the ``.dist-info`` directory or one of its
  subdirectories.

  If ``local`` is ``True``, each path is transformed into a
  local absolute path. Otherwise the raw value from ``RECORD`` is returned.


Notice that the API is organized in five classes that work with directories
and Zip files (so it works with files included in Zip files, see :pep:`273` for
more details). These classes are described in the documentation
of the prototype implementation for interested readers [#prototype]_.

Examples
~~~~~~~~

Let's use some of the new APIs with our ``docutils`` example::

    >>> from pkgutil import get_distribution, get_file_users, distinfo_dirname
    >>> dist = get_distribution('docutils')
    >>> dist.name
    'docutils'
    >>> dist.metadata.version
    '0.5'

    >>> distinfo_dirname('docutils', '0.5')
    'docutils-0.5.dist-info'

    >>> distinfo_dirname('python-ldap', '2.5')
    'python_ldap-2.5.dist-info'

    >>> distinfo_dirname('python-ldap', '2.5 a---5')
    'python_ldap-2.5.a_5.dist-info'

    >>> for path, hash, size in dist.get_installed_files()::
    ...     print '%s %s %d' % (path, hash, size)
    ...
    python2.6/site-packages/docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544
    python2.6/site-packages/docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188
    python2.6/site-packages/roman.py,a4b84aff68aa55f2e9bf70481b943D3,234
    /usr/local/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234
    python2.6/site-packages/docutils-0.5.dist-info/METADATA,6fe57de576d749536082d8e205b77748,195
    python2.6/site-packages/docutils-0.5.dist-info/RECORD

    >>> dist.uses('docutils/core.py')
    True

    >>> dist.uses('/usr/local/bin/rst2html.py')
    True

    >>> dist.get_distinfo_file('METADATA')
    <open file at ...>

    >>> dist.requested
    True


New functions in Distutils
--------------------------

Distutils already provides a very basic way to install a distribution, which
is running the ``install`` command over the ``setup.py`` script of the
distribution.

:pep:`Distutils2 <262>` will provide a very basic ``uninstall`` function, that
is added in ``distutils2.util`` and takes the name of the distribution to
uninstall as its argument. ``uninstall`` uses the APIs described earlier and
remove all unique files, as long as their hash didn't change. Then it removes
empty directories left behind.

``uninstall`` returns a list of uninstalled files::

    >>> from distutils2.util import uninstall
    >>> uninstall('docutils')
    ['/opt/local/lib/python2.6/site-packages/docutils/core.py',
     ...
     '/opt/local/lib/python2.6/site-packages/docutils/__init__.py']

If the distribution is not found, a ``DistutilsUninstallError`` is raised.

Filtering
~~~~~~~~~

To make it a reference API for third-party projects that wish to control
how ``uninstall`` works, a second callable argument can be used. It's
called for each file that is removed. If the callable returns ``True``, the
file is removed. If it returns False, it's left alone.

Examples::

    >>> def _remove_and_log(path):
    ...     logging.info('Removing %s' % path)
    ...     return True
    ...
    >>> uninstall('docutils', _remove_and_log)

    >>> def _dry_run(path):
    ...     logging.info('Removing %s (dry run)' % path)
    ...     return False
    ...
    >>> uninstall('docutils', _dry_run)

Of course, a third-party tool can use lower-level ``pkgutil`` APIs to
implement its own uninstall feature.

Installer marker
~~~~~~~~~~~~~~~~

As explained earlier in this PEP, the ``install`` command adds an ``INSTALLER``
file in the ``.dist-info`` directory with the name of the installer.

To avoid removing distributions that were installed by another packaging
system, the ``uninstall`` function takes an extra argument ``installer`` which
defaults to ``distutils2``.

When called, ``uninstall`` controls that the ``INSTALLER`` file matches
this argument. If not, it raises a ``DistutilsUninstallError``::

    >>> uninstall('docutils')
    Traceback (most recent call last):
    ...
    DistutilsUninstallError: docutils was installed by 'cool-pkg-manager'

    >>> uninstall('docutils', installer='cool-pkg-manager')

This allows a third-party application to use the ``uninstall`` function
and strongly suggest that no other program remove a distribution it has
previously installed. This is useful when a third-party program that relies
on Distutils APIs does extra steps on the system at installation time,
it has to undo at uninstallation time.

Adding an Uninstall script
~~~~~~~~~~~~~~~~~~~~~~~~~~

An ``uninstall`` script is added in Distutils2. and is used like this::

    $ python -m distutils2.uninstall projectname

Notice that script doesn't control if the removal of a distribution breaks
another distribution. Although it makes sure that all the files it removes
are not used by any other distribution, by using the uninstall function.

Also note that this uninstall script pays no attention to the
REQUESTED metadata; that is provided only for use by external tools to
provide more advanced dependency management.

Backward compatibility and roadmap
==================================

These changes don't introduce any compatibility problems since they
will be implemented in:

- pkgutil in new functions
- distutils2

The plan is to include the functionality outlined in this PEP in pkgutil for
Python 3.2, and in Distutils2.

Distutils2 will also contain a backport of the new pgkutil, and can be used for
2.4 onward.

Distributions installed using existing, pre-standardization formats do not have
the necessary metadata available for the new API, and thus will be
ignored. Third-party tools may of course to continue to support previous
formats in addition to the new format, in order to ease the transition.


References
==========

.. [#distutils]
   http://docs.python.org/distutils

.. [#distutils2]
   http://hg.python.org/distutils2

.. [#setuptools]
   http://peak.telecommunity.com/DevCenter/setuptools

.. [#easyinstall]
   http://peak.telecommunity.com/DevCenter/EasyInstall

.. [#pip]
   http://pypi.python.org/pypi/pip

.. [#eggformats]
   http://peak.telecommunity.com/DevCenter/EggFormats

.. [#fedora]
   http://fedoraproject.org/wiki/Packaging/Python/Eggs#Providing_Eggs_using_Setuptools

.. [#debian]
   http://wiki.debian.org/DebianPython/NewPolicy

.. [#prototype]
   http://bitbucket.org/tarek/pep376/

Acknowledgements
================

Jim Fulton, Ian Bicking, Phillip Eby, Rafael Villar Burke, and many people at
Pycon and Distutils-SIG.

Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: