PEP: 639
Title: Improving License Clarity with Better Package Metadata
Author: Philippe Ombredanne <pombredanne@nexb.com>,
        C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>,
PEP-Delegate: Brett Cannon <brett@python.org>
Discussions-To: https://discuss.python.org/t/12622
Status: Draft
Type: Standards Track
Topic: Packaging
Content-Type: text/x-rst
Created: 15-Aug-2019
Post-History: `15-Aug-2019 <https://discuss.python.org/t/2154>`__,
              `17-Dec-2021 <https://discuss.python.org/t/12622>`__,


.. _639-abstract:

Abstract
========

This PEP defines a specification for how licenses are documented in the
`core metadata <coremetadataspec_>`__, with
:ref:`license expression strings <639-spec-field-license-expression>` using
`SPDX identifiers <spdxid_>`__ in a new ``License-Expression`` field.
This will make license declarations simpler and less ambiguous for
package authors to create, end users to read and understand, and
tools to programmatically process.

The PEP also:

- :ref:`Formally specifies <639-spec-field-license-file>`
  a new ``License-File`` field, and defines how license files should be
  :ref:`included in distributions <639-spec-project-formats>`,
  as already used by the Wheel and Setuptools projects.

- Deprecates the legacy ``License`` :ref:`field <639-spec-field-license>`
  and ``license ::`` :ref:`classifiers <639-spec-field-classifier>`.

- :ref:`Adds and deprecates <639-spec-source-metadata>` the corresponding keys
  in the ``pyproject.toml`` ``[project]`` table.

- :ref:`Provides clear guidance <639-spec-converting-metadata>` for authors and
  tools converting legacy license metadata, adding license files and
  validating license expressions.

- Describes a :ref:`reference implementation <639-reference-implementation>`,
  analyzes numerous :ref:`potential alternatives <639-rejected-ideas>`,
  includes :ref:`detailed examples <639-examples>`,
  explains :ref:`user scenarios <639-user-scenarios>` and
  surveys license documentation
  :ref:`in Python packaging <639-license-doc-python>` and
  :ref:`other ecosystems <639-license-doc-other-projects>`.

The changes in this PEP will update the
`core metadata <coremetadataspec_>`__ to version 2.4, modify the
`project (source) metadata specification <pep621spec_>`__,
and make minor additions to the `source distribution (sdist) <sdistspec_>`__,
`built distribution (wheel) <wheelspec_>`__ and
`installed project <installedspec_>`__ standards.


.. _639-goals:

Goals
=====

This PEP's scope is limited to covering new mechanisms for documenting
the license of a distribution package, specifically defining:

- A means of specifying a SPDX license expression.
- A method of including license texts in distributions and installed projects.

The changes to the core metadata specification that this PEP requires have been
designed to minimize impact and maximize backward compatibility.
This specification builds off of existing ways to document licenses that are
already in use in popular tools (e.g. adding support to core metadata for the
``License-File`` field :ref:`already used <639-license-doc-setuptools-wheel>`
in the Wheel and Setuptools projects) and by some package authors
(e.g. storing an SPDX license expression in the existing ``License`` field).

In addition to these proposed changes, this PEP contains guidance for tools
handling and converting these metadata, a tutorial for package authors
covering various common use cases, detailed examples of them in use,
and a comprehensive survey of license documentation in Python and other
languages.

It is the intent of the PEP authors to work closely with tool maintainers to
implement the recommendations for validation and warnings specified here.


.. _639-non-goals:

Non-Goals
=========

This PEP is neutral regarding the choice of license by any particular
package author. This PEP makes no recommendation for specific licenses,
and does not require the use of a particular license documentation convention.

Rather, the SPDX license expression syntax proposed in this PEP provides a
simpler and more expressive mechanism to accurately document any kind of
license that applies to a Python package, whether it is open source,
free/libre, proprietary, or a combination of such.

This PEP also does not impose any additional restrictions when uploading to
PyPI, unless projects choose to make use of the new fields.

Instead, it is intended to document best practices already in use, extend them
to use a new formally-specified and supported mechanism, and provide guidance
for packaging tools on how to hand the transition and inform users accordingly.

This PEP also is not about license documentation in files inside projects,
though this is a :ref:`surveyed topic <639-license-doc-source-files>`
in an appendix, and nor does it intend to cover cases where the source and
binary distribution packages don't have :ref:`the same licenses
<639-rejected-ideas-difference-license-source-binary>`.


.. _639-motivation:

Motivation
==========

Software must be licensed in order for anyone other than its creator to
download, use, share and modify it, so providing accurate license information
to Python package users is an important matter.
Today, there are multiple fields where
licenses are documented in core metadata, and there are limitations to what
can be expressed in each of them. This often leads to confusion and a lack of
clarity, both for package authors and end users.

Many package authors have expressed difficulty and frustrations due to the
limited capabilities to express licensing in project metadata, and this
creates further trouble for Linux and BSD distribution re-packagers.
This has triggered a number of license-related discussions and issues,
including on `outdated and ambiguous PyPI classifiers <classifierissue_>`__,
`license interoperability with other ecosystems <interopissue_>`__,
`too many confusing license metadata options <packagingissue_>`__,
`limited support for license files in the Wheel project <wheelfiles_>`__, and
`the lack of clear, precise and standardized license metadata <pepissue_>`__.

The current license classifiers address some common cases, and could
be extended to include the full range of current SPDX identifiers
while deprecating the many ambiguous classifiers
(including some popular and problematic ones,
such as ``License :: OSI Approved :: BSD License``).
However, this requires a substantial amount of effort
to duplicate the SPDX license list and keep it in sync.
Furthermore, it is effectively a hard break in backward compatibility,
forcing a huge proportion of package authors to immediately update to new
classifiers (in most cases, with many possible choices that require closely
examining the project's license) immediately when PyPI deprecates the old ones.

Furthermore, this only covers simple packages entirely under a single license;
it doesn't address the substantial fraction of common projects that vendor
dependencies (e.g. Setuptools), offer a choice of licenses (e.g. Packaging)
or were relicensed, adapt code from other projects or contain fonts, images,
examples, binaries or other assets under other licenses. It also requires
both authors and tools understand and implement the PyPI-specific bespoke
classifier system, rather than using short, easy to add and standardized
SPDX identifiers in a simple text field, as increasingly widely adopted by
most other packaging systems to reduce the overall burden on the ecosystem.
Finally, this does not provide as clear an indicator that a package
has adopted the new system, and should be treated accordingly.

On average, Python packages tend to have more ambiguous and missing license
information than other common ecosystems (such as npm, Maven or
Gem). This is supported by the `statistics page <cdstats_>`__ of the
`ClearlyDefined project <clearlydefined_>`__, an
`Open Source Initiative <osi_>`__ incubated effort to help
improve licensing clarity of other FOSS projects, covering all packages
from PyPI, Maven, npm and Rubygems.


.. _639-rationale:

Rationale
=========

A survey of existing license metadata definitions in use in the Python
ecosystem today is provided in
:ref:`an appendix <639-license-doc-python>` of this PEP,
and license documentation in a variety of other packaging systems,
Linux distros, languages ecosystems and applications is surveyed in
:ref:`another appendix <639-license-doc-other-projects>`.

There are a few takeaways from the survey, which have guided the design
and recommendations of this PEP:

- Most package formats use a single ``License`` field.

- Many modern package systems use some form of license expression syntax to
  optionally combine more than one license identifier together.
  SPDX and SPDX-like syntaxes are the most popular in use.

- SPDX license identifiers are becoming the de facto way to reference common
  licenses everywhere, whether or not a full license expression syntax is used.

- Several package formats support documenting both a license expression and the
  paths of the corresponding files that contain the license text. Most Free and
  Open Source Software licenses require package authors to include their full
  text in a distribution.

The use of a new ``License-Expression`` field will provide an intuitive,
structured and unambiguous way to express the license of a
package using a well-defined syntax and well-known license identifiers.
Similarly, a formally-specified ``License-File`` field offers a standardized
way to ensure that the full text of the license(s) are included with the
package when distributed, as legally required, and allows other tools consuming
the core metadata to unambiguously locate a distribution's license files.

While dramatically simplifying and improving the present Python license
metadata story, this specification standardizes and builds upon
existing practice in the `Setuptools <setuptoolsfiles_>`__ and
`Wheel <wheelfiles_>`__ projects.
Furthermore, an up-to-date version of the current draft of this PEP is
`already successfully implemented <hatchimplementation_>`__ in the popular
PyPA `Hatch <hatch_>`__ packaging tool, and an earlier draft of the
license files portion is `implemented in Setuptools <setuptoolspep639_>`__.

Over time, encouraging the use of these fields and deprecating the ambiguous,
duplicative and confusing legacy alternatives will help Python software
publishers improve the clarity, accuracy and portability of their licensing
practices, to the benefit of package authors, consumers and redistributors
alike.


.. _639-terminology:

Terminology
===========

This PEP seeks to clearly define the terms it uses, given that some have
multiple established meanings (e.g. import vs. distribution package,
wheel *format* vs. Wheel *project*); are related and often used
interchangeably, but have critical distinctions in meaning
(e.g. ``[project]`` *key* vs. core metadata *field*); are existing concepts
that don't have formal terms/definitions (e.g. project/source metadata vs.
distribution/built metadata, build vs. publishing tools), or are new concepts
introduced here (e.g. license expression/identifier).

This PEP also uses terms defined in the
`PyPA PyPUG Glossary <pypugglossary_>`__
(specifically *built/binary distribution*, *distribution package*,
*project* and *source distribution*), and by the `SPDX Project <spdx_>`__
(*license identifier*, *license expression*).

The keywords "MUST", "MUST NOT", "REQUIRED",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in :rfc:`2119`.

Terms are listed here in their full versions;
related words (``Rel:``) are in parenthesis,
including short forms (``Short:``), sub-terms (``Sub:``) and common synonyms
for the purposes of this PEP (``Syn:``).

**Core Metadata** *(Syn: Package Metadata, Sub: Distribution Metadata)*
  The `PyPA specification <coremetadataspec_>`__ and the set of metadata fields
  it defines that describe key static attributes of distribution packages
  and installed projects.

  The **distribution metadata** refers to, more specifically, the concrete form
  core metadata takes when included inside a distribution archive
  (``PKG-INFO`` in a sdist and ``METADATA`` in a wheel) or installed project
  (``METADATA``).

**Core Metadata Field** *(Short: Metadata Field/Field)*
  A single key-value pair, or sequence of such with the same key, as defined
  by the `core metadata specification <coremetadataspec_>`__.
  Notably, distinct from a ``pyproject.toml`` ``[project]`` table *key*.

**Distribution Package** *(Sub: Package, Distribution Archive)*
  (`See PyPUG <pypugdistributionpackage_>`__)
  In this PEP, **package** is used to refer to the abstract concept of a
  distributable form of a Python project, while **distribution** more
  specifically references the physical **distribution archive**.

**License Classifier**
  A `PyPI Trove classifier <classifiers_>`__
  (as `described in the core metadata specification
  <coremetadataclassifiers_>`__)
  which begins with ``License ::``, currently used to indicate
  a project's license status by including it as a ``Classifier``
  in the core metadata.

**License Expression** *(Syn: SPDX Expression)*
  A string with valid `SPDX license expression syntax <spdxpression_>`__
  including any SPDX license identifiers as defined here, which describes
  a project's license(s) and how they relate to one another. Examples:
  ``GPL-3.0-or-later``, ``MIT AND (Apache-2.0 OR BSD-2-clause)``

**License Identifier** *(Syn: License ID/SPDX Identifier)*
  A valid `SPDX short-form license identifier <spdxid_>`__, as described in the
  :ref:`639-spec-field-license-expression` section of this PEP; briefly,
  this includes all valid SPDX identifiers and the ``LicenseRef-Public-Domain``
  and ``LicenseRef-Proprietary`` strings. Examples: ``MIT``, ``GPL-3.0-only``

**Project** *(Sub: Project Source Tree, Installed Project)*
  (`See PyPUG <pypugproject_>`__)
  Here, a **project source tree** refers to the on-disk format of
  a project used for development, while an **installed project** is the form a
  project takes once installed from a distribution, as
  `specified by PyPA <installedspec_>`__.

**Project Source Metadata** *(Sub: Project Table Metadata, Key, Subkey)*
  Core metadata defined by the package author in the project source tree,
  as top-level keys in the ``[project]`` table of a ``pyproject.toml`` file,
  in the ``[metadata]`` table of ``setup.cfg``, or the equivalent for other
  build tools.

  The **Project Table Metadata**, or ``pyproject.toml`` ``[project]`` metadata,
  refers specifically to the former, as defined by the
  `PyPA Declaring Project Metadata specification <pep621spec_>`__
  and originally specified in :pep:`621`.
  A **Project Table Key**, or an unqualified *key* refers specifically to
  a top-level ``[project]`` key
  (notably, distinct from a core metadata *field*),
  while a **subkey** refers to a second-level key in a table-valued
  ``[project]`` key.

**Root License Directory** *(Short: License Directory)*
  The directory under which license files are stored in a project/distribution
  and the root directory that their paths, as recorded under the
  ``License-File`` core metadata fields, are relative to.
  Defined here to be the project root directory for source trees and source
  distributions, and a subdirectory named ``licenses`` of the directory
  containing the core metadata (i.e., the ``.dist-info/licenses``
  directory) for built distributions and installed projects.

**Tool** *(Sub: Packaging Tool, Build Tool, Install Tool, Publishing Tool)*
  A program, script or service executed by the user or automatically that
  seeks to conform to the specification defined in this PEP.

  A **packaging tool** refers to a tool used to build, publish,
  install, or otherwise directly interact with Python packages.

  A **build tool** is a packaging tool used to generate a source or built
  distribution from a project source tree or sdist, when directly invoked
  as such (as opposed to by end-user-facing install tools).
  Examples: Wheel project, :pep:`517` backends via ``build`` or other
  package-developer-facing frontends, calling ``setup.py`` directly.

  An **install tool** is a packaging tool used to install a source or built
  distribution in a target environment. Examples include the PyPA pip and
  ``installer`` projects.

  A **publishing tool** is a packaging tool used to upload distribution
  archives to a package index, such as Twine for PyPI.

**Wheel** *(Short: wheel, Rel: wheel format, Wheel project)*
  Here, **wheel**, the standard built distribution format introduced in
  :pep:`427` and `specified by the PyPA <wheelspec_>`__, will be referred to in
  lowercase, while the `Wheel project <wheelproject_>`__, its reference
  implementation, will be referred to as such with **Wheel** in Title Case.


.. _639-specification:

Specification
=============

The changes necessary to implement the improved license handling outlined in
this PEP include those in both
:ref:`distribution package metadata <639-spec-core-metadata>`,
as defined in the `core metadata specification <coremetadataspec_>`__, and
:ref:`author-provided project source metadata <639-spec-source-metadata>`,
as defined in the `project source metadata specification <_pep621spec>`__
(and originally introduced in :pep:`621`).

Further, :ref:`minor additions <639-spec-project-formats>` to the
source distribution (sdist), built distribution (wheel) and installed project
specifications will help document and clarify the already allowed,
now formally standardized behavior in these respects.
Finally, :ref:`guidance is established <639-spec-converting-metadata>`
for tools handling and converting legacy license metadata to license
expressions, to ensure the results are consistent, correct and unambiguous.

Note that the guidance on errors and warnings is for tools' default behavior;
they MAY operate more strictly if users explicitly configure them to do so,
such as by a CLI flag or a configuration option.


.. _639-spec-core-metadata:

Core metadata
-------------

The `PyPA Core Metadata specification <coremetadataspec_>`__ defines the names
and semantics of each of the supported fields in the distribution metadata of
Python distribution packages and installed projects.

This PEP :ref:`adds <639-spec-field-license-expression>` the
``License-Expression`` field,
:ref:`adds <639-spec-field-license-file>` the ``License-File`` field,
:ref:`deprecates <639-spec-field-license>` the ``License`` field,
and :ref:`deprecates <639-spec-field-classifier>` the license classifiers
in the ``Classifier`` field.

The error and warning guidance in this section applies to build and
publishing tools; end-user-facing install tools MAY be more lenient than
mentioned here when encountering malformed metadata
that does not conform to this specification.

As it adds new fields, this PEP updates the core metadata to version 2.4.


.. _639-spec-field-license-expression:

Add ``License-Expression`` field
''''''''''''''''''''''''''''''''

The ``License-Expression`` optional field is specified to contain a text string
that is a valid SPDX license expression, as defined herein.

Publishing tools SHOULD issue an informational warning if this field is
missing, and MAY raise an error. Build tools MAY issue a similar warning,
but MUST NOT raise an error.

.. _639-license-expression-definition:

A license expression is a string using the SPDX license expression syntax as
documented in the `SPDX specification <spdxpression_>`__, either
Version 2.2 or a later compatible version.

When used in the ``License-Expression`` field and as a specialization of
the SPDX license expression definition, a license expression can use the
following license identifiers:

- Any SPDX-listed license short-form identifiers that are published in the
  `SPDX License List <spdxlist_>`__, version 3.17 or any later compatible
  version. Note that the SPDX working group never removes any license
  identifiers; instead, they may choose to mark an identifier as "deprecated".

- The ``LicenseRef-Public-Domain`` and ``LicenseRef-Proprietary`` strings, to
  identify licenses that are not included in the SPDX license list.

When processing the ``License-Expression`` field to determine if it contains
a valid license expression, build and publishing tools:

- SHOULD halt execution and raise an error if:

  - The field does not contain a valid license expression

  - One or more license identifiers are not valid
    (as :ref:`defined above <639-license-expression-definition>`)

- SHOULD report an informational warning, and publishing tools MAY raise an
  error, if one or more license identifiers have been marked as deprecated in
  the `SPDX License List <spdxlist_>`__.

- MUST store a case-normalized version of the ``License-Expression`` field
  using the reference case for each SPDX license identifier and
  uppercase for the ``AND``, ``OR`` and ``WITH`` keywords.

- SHOULD report an informational warning, and MAY raise an error if
  the normalization process results in changes to the
  ``License-Expression`` field contents.

For all newly-upload distributions that include a
``License-Expression`` field, the `Python Package Index (PyPI) <pypi_>`__ MUST
validate that it contains a valid, case-normalized license expression with
valid identifiers (as defined here) and MUST reject uploads that do not.
PyPI MAY reject an upload for using a deprecated license identifier,
so long as it was deprecated as of the above-mentioned SPDX License List
version.


.. _639-spec-field-license-file:

Add ``License-File`` field
''''''''''''''''''''''''''

Each instance of the ``License-File`` optional field is specified to contain
the string representation of the path in the project source tree, relative to
the project root directory, of a license-related file.
It is a multi-use field that may appear zero or
more times, each instance listing the path to one such file. Files specified
under this field could include license text, author/attribution information,
or other legal notices that need to be distributed with the package.

As :ref:`specified by this PEP <639-spec-project-formats>`, its value
is also that file's path relative to the root license directory in both
installed projects and the standardized distribution package types.
In other legacy, non-standard or new distribution package formats and
mechanisms of accessing and storing core metadata, the value MAY correspond
to the license file path relative to a format-defined root license directory.
Alternatively, it MAY be treated as a unique abstract key to access the
license file contents by another means, as specified by the format.

If a ``License-File`` is listed in a source or built distribution's core
metadata, that file MUST be included in the distribution at the specified path
relative to the root license directory, and MUST be installed with the
distribution at that same relative path.

The specified relative path MUST be consistent between project source trees,
source distributions (sdists), built distributions (wheels) and installed
projects. Therefore, inside the root license directory, packaging tools
MUST reproduce the directory structure under which the
source license files are located relative to the project root.

Path delimiters MUST be the forward slash character (``/``),
and parent directory indicators (``..``) MUST NOT be used.
License file content MUST be UTF-8 encoded text.

Build tools MAY and publishing tools SHOULD produce an informative warning
if a built distribution's metadata contains no ``License-File`` entries,
and publishing tools MAY but build tools MUST NOT raise an error.

For all newly-uploaded distribution packages that include one or more
``License-File`` fields and declare a ``Metadata-Version`` of ``2.4`` or
higher, PyPI SHOULD validate that the specified files are present in all
uploaded distributions, and MUST reject uploads that do not validate.


.. _639-spec-field-license:

Deprecate ``License`` field
'''''''''''''''''''''''''''

The legacy unstructured-text ``License`` field is deprecated and replaced by
the new ``License-Expression`` field. Build and publishing tools MUST raise
an error if both these fields are present and their values are not identical,
including capitalization and excluding leading and trailing whitespace.

If only the ``License`` field is present, such tools SHOULD issue a warning
informing users it is deprecated and recommending ``License-Expression``
instead.

For all newly-uploaded distributions that include a
``License-Expression`` field, the `Python Package Index (PyPI) <pypi_>`__ MUST
reject any that specify a ``License`` field and the text of which is not
identical to that of ``License-Expression``, as defined in this section.

Along with license classifiers, the ``License`` field may be removed from a
new version of the specification in a future PEP.


.. _639-spec-field-classifier:

Deprecate license classifiers
'''''''''''''''''''''''''''''

Using license `classifiers <classifiers_>`__ in the ``Classifier`` field
(`described in the core metadata specification <coremetadataclassifiers_>`__)
is deprecated and replaced by the more precise ``License-Expression`` field.

If the ``License-Expression`` field is present, build tools SHOULD and
publishing tools MUST raise an error if one or more license classifiers
is included in a ``Classifier`` field, and MUST NOT add
such classifiers themselves.

Otherwise, if this field contains a license classifier, build tools MAY
and publishing tools SHOULD issue a warning informing users such classifiers
are deprecated, and recommending ``License-Expression`` instead.
For compatibility with existing publishing and installation processes,
the presence of license classifiers SHOULD NOT raise an error unless
``License-Expression`` is also provided.

For all newly-uploaded distributions that include a
``License-Expression`` field, the `Python Package Index (PyPI) <pypi_>`__ MUST
reject any that also specify any license classifiers.

New license classifiers MUST NOT be `added to PyPI <classifiersrepo_>`__;
users needing them SHOULD use the ``License-Expression`` field instead.
Along with the ``License`` field, license classifiers may be removed from a
new version of the specification in a future PEP.


.. _639-spec-source-metadata:

Project source metadata
-----------------------

As originally introduced in :pep:`621`, the
`PyPA Declaring Project Metadata specification <pep621spec_>`__
defines how to declare a project's source
metadata under a ``[project]`` table in the ``pyproject.toml`` file for
build tools to consume and output distribution core metadata.

This PEP :ref:`adds <639-spec-key-license-expression>`
a top-level string value for the ``license`` key,
:ref:`adds <639-spec-key-license-files>` the new ``license-files`` key
and :ref:`deprecates <639-spec-key-license>`
the table value for the ``license`` key
along with its corresponding table subkeys, ``text`` and ``file``.


.. _639-spec-key-license-expression:

Add string value to ``license`` key
'''''''''''''''''''''''''''''''''''

A top-level string value is defined
for the ``license`` key in the ``[project]`` table,
which is specified to be a valid SPDX license expression,
as :ref:`defined previously <639-license-expression-definition>`.
Its value maps to the ``License-Expression`` field in the core metadata.

Build tools SHOULD validate the expression as described in the
:ref:`639-spec-field-license-expression` section,
outputting an error or warning as specified.
When generating the core metadata, tools MUST perform case normalization.

If a top-level string value for the ``license`` key is present and valid,
for purposes of backward compatibility
tools MAY back-fill the ``License`` core metadata field
with the normalized value of the ``license`` key.


.. _639-spec-key-license-files:

Add ``license-files`` key
'''''''''''''''''''''''''

A new ``license-files`` key is added to the ``[project]`` table for specifying
paths in the project source tree relative to ``pyproject.toml`` to file(s)
containing licenses and other legal notices to be distributed with the package.
It corresponds to the ``License-File`` fields in the core metadata.

Its value is a table, which if present MUST contain one of two optional,
mutually exclusive subkeys, ``paths`` and ``globs``; if both are specified,
tools MUST raise an error. Both are arrays of strings; the ``paths`` subkey
contains verbatim file paths, and the ``globs`` subkey valid glob patterns,
which MUST be parsable by the ``glob`` `module <globmodule_>`__ in the
Python standard library.

**Note**: To avoid ambiguity, confusion and (per :pep:`20`, the Zen of Python)
"more than one (obvious) way to do it", allowing a flat array of strings
as the value for the ``license-files`` key has been
:ref:`left out for now <639-license-files-allow-flat-array>`.

Path delimiters MUST be the forward slash character (``/``),
and parent directory indicators (``..``) MUST NOT be used.
Tools MUST assume that license file content is valid UTF-8 encoded text,
and SHOULD validate this and raise an error if it is not.

If the ``paths`` subkey is a non-empty array, build tools:

- MUST treat each value as a verbatim, literal file path, and
  MUST NOT treat them as glob patterns.

- MUST include each listed file in all distribution archives.

- MUST NOT match any additional license files beyond those explicitly
  statically specified by the user under the ``paths`` subkey.

- MUST list each file path under a ``License-File`` field in the core metadata.

- MUST raise an error if one or more paths do not correspond to a valid file
  in the project source that can be copied into the distribution archive.

If the ``globs`` subkey is a non-empty array, build tools:

- MUST treat each value as a glob pattern, and MUST raise an error if the
  pattern contains invalid glob syntax.

- MUST include all files matched by at least one listed pattern in all
  distribution archives.

- MAY exclude files matched by glob patterns that can be unambiguously
  determined to be backup, temporary, hidden, OS-generated or VCS-ignored.

- MUST list each matched file path under a ``License-File`` field in the
  core metadata.

- SHOULD issue a warning and MAY raise an error if no files are matched.

- MAY issue a warning if any individual user-specified pattern
  does not match at least one file.

If the ``license-files`` key is present, and the ``paths`` or ``globs`` subkey
is set to a value of an empty array, then tools MUST NOT include any
license files and MUST NOT raise an error.

.. _639-default-patterns:

If the ``license-files`` key is not present and not explicitly marked as
``dynamic``, tools MUST assume a default value of the following:

.. code-block:: toml

    license-files.globs = ["LICEN[CS]E*", "COPYING*", "NOTICE*", "AUTHORS*"]

In this case, tools MAY issue a warning if no license files are matched,
but MUST NOT raise an error.

If the ``license-files`` key is marked as ``dynamic`` (and not present),
to preserve consistent behavior with current tools and help ensure the packages
they create are legally distributable, build tools SHOULD default to
including at least the license files matching the above patterns, unless the
user has explicitly specified their own.


.. _639-spec-key-license:

Deprecate ``license`` key table subkeys
'''''''''''''''''''''''''''''''''''''''

Table values for the ``license`` key in the ``[project]`` table,
including the ``text`` and ``file`` table subkeys, are now deprecated.
If the new ``license-files`` key is present,
build tools MUST raise an error if the ``license`` key is defined
and has a value other than a single top-level string.

If the new ``license-files`` key is not present
and the ``text`` subkey is present in a ``license`` table,
tools SHOULD issue a warning informing users it is deprecated
and recommending a license expression as a top-level string key instead.

Likewise, if the new ``license-files`` key is not present
and the ``file`` subkey is present in the ``license`` table,
tools SHOULD issue a warning informing users it is deprecated and recommending
the ``license-files`` key instead.

If the specified license ``file`` is present in the source tree,
build tools SHOULD use it to fill the ``License-File`` field
in the core metadata, and MUST include the specified file
as if it were specified in a ``license-file.paths`` field.
If the file does not exist at the specified path,
tools MUST raise an informative error as previously specified.
However, tools MUST also still assume the
:ref:`specified default value <639-default-patterns>`
for the ``license-files`` key and also include,
in addition to a license file specified under the ``license.file`` subkey,
any license files that match the specified list of patterns.

Table values for the ``license`` key MAY be removed
from a new version of the specification in a future PEP.


.. _639-spec-project-formats:

License files in project formats
--------------------------------

A few minor additions will be made to the relevant existing specifications
to document, standardize and clarify what is already currently supported,
allowed and implemented behavior, as well as explicitly mention the root
license directory the license files are located in and relative to for
each format, per the :ref:`639-spec-field-license-file` section.

**Project source trees**
  As described in the :ref:`639-spec-source-metadata` section, the
  `Declaring Project Metadata specification <pep621spec_>`__
  will be updated to reflect that license file paths MUST be relative to the
  project root directory; i.e. the directory containing the ``pyproject.toml``
  (or equivalently, other legacy project configuration,
  e.g. ``setup.py``, ``setup.cfg``, etc).

**Source distributions** *(sdists)*
  The `sdist specification <sdistspec_>`__ will be updated to reflect that for
  ``Metadata-Version`` is ``2.4`` or greater, the sdist MUST contain any
  license files specified by ``License-File`` in the ``PKG-INFO`` at their
  respective paths relative to the top-level directory of the sdist
  (containing the ``pyproject.toml`` and the ``PKG-INFO`` core metadata).

**Built distributions** *(wheels)*
  The `wheel specification <wheelspec_>`__ will be updated to reflect that if
  the ``Metadata-Version`` is ``2.4`` or greater and one or more
  ``License-File`` fields is specified, the ``.dist-info`` directory MUST
  contain a ``licenses`` subdirectory, which MUST contain the files listed
  in the ``License-File`` fields in the ``METADATA`` file at their respective
  paths relative to the ``licenses`` directory.

**Installed projects**
  The `Recording Installed Projects specification <installedspec_>`__ will be
  updated to reflect that if the ``Metadata-Version`` is ``2.4`` or greater
  and one or more ``License-File`` fields is specified, the ``.dist-info``
  directory MUST contain a ``licenses`` subdirectory which MUST contain
  the files listed in the ``License-File`` fields in the ``METADATA`` file
  at their respective paths relative to the ``licenses`` directory,
  and that any files in this directory MUST be copied from wheels
  by install tools.


.. _639-spec-converting-metadata:

Converting legacy metadata
--------------------------

Tools MUST NOT use the contents of the ``license.text`` ``[project]`` key
(or equivalent tool-specific format),
license classifiers or the value of the core metadata ``License`` field
to fill the top-level string value of the ``license`` key ``
or the core metadata ``License-Expression`` field
without informing the user and requiring unambiguous, affirmative user action
to select and confirm the desired license expression value before proceeding.


.. _639-spec-mapping-classifiers-identifiers:

Mapping license classifiers to SPDX identifiers
'''''''''''''''''''''''''''''''''''''''''''''''

Most single license classifiers (namely, all those not mentioned below)
map to a single valid SPDX license identifier,
allowing tools to infer the SPDX license identifier they correspond to,
both for use when analyzing and auditing packages,
and providing a semi-automated mechanism of filling the ``license`` key
or the ``License-Expression`` field
following the :ref:`specification above <639-spec-converting-metadata>`.

Some legacy license classifiers intend to specify a particular license,
but do not specify the particular version or variant, leading to a
`critical ambiguity <classifierissue_>`__
as to their terms, compatibility and acceptability.
Tools MUST NOT attempt to automatically infer a ``License-Expression``
when one of these classifiers is used without affirmative user action:

- ``License :: OSI Approved :: Academic Free License (AFL)``
- ``License :: OSI Approved :: Apache Software License``
- ``License :: OSI Approved :: Apple Public Source License``
- ``License :: OSI Approved :: Artistic License``
- ``License :: OSI Approved :: BSD License``
- ``License :: OSI Approved :: GNU Affero General Public License v3``
- ``License :: OSI Approved :: GNU Free Documentation License (FDL)``
- ``License :: OSI Approved :: GNU General Public License (GPL)``
- ``License :: OSI Approved :: GNU General Public License v2 (GPLv2)``
- ``License :: OSI Approved :: GNU General Public License v3 (GPLv3)``
- ``License :: OSI Approved :: GNU Lesser General Public License v2 (LGPLv2)``
- ``License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)``
- ``License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)``
- ``License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)``

A comprehensive mapping of these classifiers to their possible specific
identifiers was `assembled by Dustin Ingram <badclassifiers_>`__, which tools
MAY use as a reference for the identifier selection options to offer users
when prompting the user to explicitly select the license identifier
they intended for their project.

.. note::

    Several additional classifiers, namely the "or later" variants of
    the AGPLv3, GPLv2, GPLv3 and LGPLv3, are also listed in the aforementioned
    mapping, but unambiguously map to their respective licenses,
    and so are not listed here.
    However, LGPLv2 is included above, as it could ambiguously
    refer to either the distinct v2.0 or v2.1 variants of that license.

In addition, for the various special cases, the following mappings are
considered canonical and normative for the purposes of this specification:

- Classifier ``License :: Public Domain`` MAY be mapped to the generic
  ``License-Expression: LicenseRef-Public-Domain``.
  If tools do so, they SHOULD issue an informational warning encouraging
  the use of more explicit and legally portable license identifiers,
  such as those for the `CC0 1.0 license <cc0_>`__ (``CC0-1.0``),
  the `Unlicense <unlicense_>`__ (``Unlicense``),
  or the `MIT license <mitlicense_>`__ (``MIT``),
  since the meaning associated with the term "public domain" is thoroughly
  dependent on the specific legal jurisdiction involved,
  some of which lack the concept entirely.
  Alternatively, tools MAY choose to treat these classifiers as ambiguous.

- The generic and sometimes ambiguous classifiers:

  - ``License :: Free For Educational Use``
  - ``License :: Free For Home Use``
  - ``License :: Free for non-commercial use``
  - ``License :: Freely Distributable``
  - ``License :: Free To Use But Restricted``
  - ``License :: Freeware``
  - ``License :: Other/Proprietary License``

  MAY be mapped to the generic
  ``License-Expression: LicenseRef-Proprietary``,
  but tools MUST issue a prominent, informative warning if they do so.
  Alternatively, tools MAY choose to treat these classifiers as ambiguous.

- The generic and ambiguous classifiers ``License :: OSI Approved`` and
  ``License :: DFSG approved`` do not map to any license expression,
  and thus tools SHOULD treat them as ambiguous, or if not MUST ignore them.

- The classifiers ``License :: GUST Font License 1.0`` and
  ``License :: GUST Font License 2006-09-30`` have no mapping to SPDX license
  identifiers, and no PyPI package uses them as of 2022-07-09.

When multiple license classifiers are used, their relationship is ambiguous,
and it is typically not possible to determine if all the licenses apply or if
there is a choice that is possible among the licenses,
In this case, tools MUST NOT automatically infer a license expression,
unless one license classifier is a parent of the other,
i.e. the child contains all ``::``-delineated components of the parent,
in which case tools MAY ignore the parent classifier
but SHOULD issue an informative warning when doing so.


.. _639-backwards-compatibility:

Backwards Compatibility
=======================

Adding a new, dedicated ``License-Expression`` core metadata field
and a top-level string value for the ``license`` key reserved for this purpose
in the ``pyproject.toml`` ``[project]`` table
unambiguously signals support for the specification in this PEP.
This avoids the risk of new tooling
misinterpreting a license expression as a free-form license description
or vice versa, and raises an error if and only if the user affirmatively
upgrades to the latest metadata version and adds the new field/key.

The legacy ``License`` core metadata field
and the ``license`` key table subkeys (``text`` and ``file``)
in the ``pyproject.toml`` ``[project]`` table
will be deprecated along with the license classifiers,
retaining backwards compatibility while gently preparing users for their
future removal. Such a removal would follow a suitable transition period, and
be left to a future PEP and a new version of the core metadata specification.

Formally specifying the new ``License-File`` core metadata field and the
inclusion of the listed files in the distribution merely codifies and
refines the existing practices in popular packaging tools, including the Wheel
and Setuptools projects, and is designed to be largely backwards-compatible
with their existing use of that field. Likewise, the new ``license-files``
key in the ``[project]`` table of ``pyproject.toml``
standardizes statically specifying the files to include,
as well as the default behavior, and allows other tools to make use of them,
while only having an effect once users and tools expressly adopt it.

Due to requiring license files not be flattened into ``.dist-info`` and
specifying that they should be placed in a dedicated ``licenses`` subdir,
wheels produced following this change will have differently-located
licenses relative to those produced via the previous unspecified,
installer-specific behavior, but as until this PEP there was no way of
discovering these files or accessing them programmatically, and this will
be further discriminated by a new metadata version, there aren't any foreseen
mechanism for this to pose a practical issue.

Furthermore, this resolves existing compatibility issues with the current
ad hoc behavior, namely license files being silently clobbered if they have
the same names as others at different paths, unknowingly rendering the wheel
undistributable, and conflicting with the names of other metadata files in
the same directory. Formally specifying otherwise would in fact block full
forward compatibility with additional standard or installer-specified files
and directories added to ``.dist-info``, as they too could conflict with
the names of existing licenses.

While minor additions will be made to the source distribution (sdist),
built distribution (wheel) and installed project specifications, all of these
are merely documenting, clarifying and formally specifying behaviors explicitly
allowed under their current respective specifications, and already implemented
in practice, and gating them behind the explicit presence of both the new
metadata versions and the new fields. In particular, sdists may contain
arbitrary files following the project source tree layout, and formally
mentioning that these must include the license files listed in the metadata
merely documents and codifies existing Setuptools practice. Likewise, arbitrary
installer-specific files are allowed in the ``.dist-info`` directory of wheels
and copied to installed projects, and again this PEP just formally clarifies
and standardizes what is already being done.

Finally, while this PEP does propose PyPI implement validation of the new
``License-Expression`` and ``License-File`` fields, this has no effect on
existing packages, nor any effect on any new distributions uploaded unless they
explicitly choose to opt in to using these new fields while not
following the requirements in the specification. Therefore, this does not have
a backward compatibility impact, and in fact ensures forward compatibility with
any future changes by ensuring all distributions uploaded to PyPI with the new
fields are valid and conform to the specification.


.. _639-security-implications:

Security Implications
=====================

This PEP has no foreseen security implications: the ``License-Expression``
field is a plain string and the ``License-File`` fields are file paths.
Neither introduces any known new security concerns.


.. _639-how-to-teach-this:

How to Teach This
=================

The simple cases are simple: a single license identifier is a valid license
expression, and a large majority of packages use a single license.

The plan to teach users of packaging tools how to express their package's
license with a valid license expression is to have tools issue informative
messages when they detect invalid license expressions, or when the deprecated
``License`` field or license classifiers are used.

An immediate, descriptive error message if an invalid ``License-Expression``
is used will help users understand they need to use SPDX identifiers in
this field, and catch them if they make a mistake.
For authors still using the now-deprecated, less precise and more redundant
``License`` field or license classifiers, packaging tools will warn
them and inform them of the modern replacement, ``License-Expression``.
Finally, for users who may have forgotten or not be aware they need to do so,
publishing tools will gently guide them toward including ``license``
and ``license-files`` in their project source metadata.

Tools may also help with the conversion and suggest a license expression in
many, if not most common cases:

- The section :ref:`639-spec-mapping-classifiers-identifiers` provides
  tool authors with guidelines on how to suggest a license expression produced
  from legacy classifiers.

- Tools may also be able to infer and suggest how to update
  an existing ``License`` value in project source metadata
  and convert that to a license expression,
  as also :ref:`specified in this PEP <639-spec-converting-metadata>`.
  For instance, a tool may suggest converting a value of ``MIT``
  in the ``license.text`` key in ``[project]``
  (or the equivalent in tool-specific formats)
  to a top-level string value of the ``license`` key (or equivalent).
  Likewise, a tool could suggest converting from a ``License`` of ``Apache2``
  (which is not a valid license expression
  as :ref:`defined in this PEP <639-spec-field-license-expression>`)
  to a ``License-Expression`` of ``Apache-2.0``
  (the equivalent valid license expression using an SPDX license identifier).


.. _639-reference-implementation:

Reference Implementation
========================

Tools will need to support parsing and validating license expressions in the
``License-Expression`` field.

The `license-expression library <licenseexplib_>`__ is a reference Python
implementation that handles license expressions including parsing,
formatting and validation, using flexible lists of license symbols
(including SPDX license IDs and any extra identifiers included here).
It is licensed under Apache-2.0 and is already used in several projects,
including the `SPDX Python Tools <spdxpy_>`__,
the `ScanCode toolkit <scancodetk_>`__
and the Free Software Foundation Europe (FSFE) `REUSE project <reuse_>`__.


.. _639-rejected-ideas:

Rejected Ideas
==============

Core metadata fields
--------------------

Potential alternatives to the structure, content and deprecation of the
core metadata fields specified in this PEP.


Re-use the ``License`` field
''''''''''''''''''''''''''''

Following `initial discussion <reusediscussion_>`__, earlier versions of this
PEP proposed re-using the existing ``License`` field, which tools would
attempt to parse as a SPDX license expression with a fallback to free text.
Initially, this would merely cause a warning (or even pass silently),
but would eventually be treated as an error by modern tooling.

This offered the potential benefit of greater backwards-compatibility,
easing the community into using SPDX license expressions while taking advantage
of packages that already have them (either intentionally or coincidentally),
and avoided adding yet another license-related field.

However, following substantial discussion, consensus was reached that a
dedicated ``License-Expression`` field was the preferred overall approach.
The presence of this field is an unambiguous signal that a package
intends it to be interpreted as a valid SPDX identifier, without the need
for complex and potentially erroneous heuristics, and allows tools to
easily and unambiguously detect invalid content.

This avoids both false positive (``License`` values that a package author
didn't explicitly intend as an explicit SPDX identifier, but that happen
to validate as one), and false negatives (expressions the author intended
to be valid SPDX, but due to a typo or mistake are not), which are otherwise
not clearly distinguishable from true positives and negatives, an ambiguity
at odds with the goals of this PEP.

Furthermore, it allows both the existing ``License`` field and
the license classifiers to be more easily deprecated,
with tools able to cleanly distinguish between packages intending to
affirmatively conform to the updated specification in this PEP or not,
and adapt their behavior (warnings, errors, etc) accordingly.
Otherwise, tools would either have to allow duplicative and potentially
conflicting ``License`` fields and classifiers, or warn/error on the
substantial number of existing packages that have SPDX identifiers as the
value for the ``License`` field, intentionally or otherwise (e.g. ``MIT``).

Finally, it avoids changing the behavior of an existing metadata field,
and avoids tools having to guess the ``Metadata-Version`` and field behavior
based on its value rather than merely its presence.

While this would mean the subset of existing distributions containing
``License`` fields valid as SPDX license expressions wouldn't automatically be
recognized as such, this only requires appending a few characters to the key
name in the project's source metadata, and this PEP provides extensive
guidance on how this can be done automatically by tooling.

Given all this, it was decided to proceed with defining a new,
purpose-created field, ``License-Expression``.


Re-Use the ``License`` field with a value prefix
''''''''''''''''''''''''''''''''''''''''''''''''

As an alternative to the previous, prefixing SPDX license expressions with,
e.g. ``spdx:`` was suggested to reduce the ambiguity inherent in re-using
the ``License`` field. However, this effectively amounted to creating
a field within a field, and doesn't address all the downsides of
keeping the ``License`` field. Namely, it still changes the behavior of an
existing metadata field, requires tools to parse its value
to determine how to handle its content, and makes the specification and
deprecation process more complex and less clean.

Yet, it still shares a same main potential downside as just creating a new
field: projects currently using valid SPDX identifiers in the ``License``
field, intentionally or not, won't be automatically recognized, and requires
about the same amount of effort to fix, namely changing a line in the
project's source metadata. Therefore, it was rejected in favor of a new field.


Don't make ``License-Expression`` mutually exclusive
''''''''''''''''''''''''''''''''''''''''''''''''''''

For backwards compatibility, the ``License`` field and/or the license
classifiers could still be allowed together with the new
``License-Expression`` field, presumably with a warning. However, this
could easily lead to inconsistent, and at the very least duplicative
license metadata in no less than *three* different fields, which is
squarely contrary to the goals of this PEP of making the licensing story
simpler and unambiguous. Therefore, and in concert with clear community
consensus otherwise, this idea was soundly rejected.


Don't deprecate existing ``License`` field and classifiers
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Several community members were initially concerned that deprecating the
existing ``License`` field and classifiers would result in
excessive churn for existing package authors and raise the barrier to
entry for new ones, particularly everyday Python developers seeking to
package and publish their personal projects without necessarily caring
too much about the legal technicalities or being a "license lawyer".
Indeed, every deprecation comes with some non-zero short-term cost,
and should be carefully considered relative to the overall long-term
net benefit. And at the minimum, this change shouldn't make it more
difficult for the average Python developer to share their work under
a license of their choice, and ideally improve the situation.

Following many rounds of proposals, discussion and refinement,
the general consensus was clearly in favor of deprecating the legacy
means of specifying a license, in favor of "one obvious way to do it",
to improve the currently complex and fragmented story around license
documentation. Not doing so would leave three different un-deprecated ways of
specifying a license for a package, two of them ambiguous, less than
clear/obvious how to use, inconsistently documented and out of date.
This is more complex for all tools in the ecosystem to support
indefinitely (rather than simply installers supporting older packages
implementing previous frozen metadata versions), resulting in a non-trivial
and unbounded maintenance cost.

Furthermore, it leads to a more complex and confusing landscape for users with
three similar but distinct options to choose from, particularly with older
documentation, answers and articles floating around suggesting different ones.
Of the three, ``License-Expression`` is the simplest and clearest to use
correctly; users just paste in their desired license identifier, or select it
via a tool, and they're done; no need to learn about Trove classifiers and
dig through the list to figure out which one(s) apply (and be confused
by many ambiguous options), or figure out on their own what should go
in the ``license`` key (anything from nothing, to the license text,
to a free-form description, to the same SPDX identifier they would be
entering in the ``license`` key anyway, assuming they can
easily find documentation at all about it). In fact, this can be
made even easier thanks to the new field. For example, GitHub's popular
`ChooseALicense.com <choosealicense_>`__ links to how to add SPDX license
identifiers to the project source metadata of various languages that support
them right in the sidebar of every license page; the SPDX support in this
PEP enables adding Python to that list.

For current package maintainers who have specified a ``License`` or license
classifiers, this PEP only recommends warnings and prohibits errors for
all but publishing tools, which are allowed to error if their intended
distribution platform(s) so requires. Once maintainers are ready to
upgrade, for those already using SPDX license expressions (accidentally or not)
this only requires appending a few characters to the key name in the
project's source metadata, and for those with license classifiers that
map to a single unambiguous license, or another defined case (public domain,
proprietary), they merely need to drop the classifier and paste in the
corresponding license identifier. This PEP provides extensive guidance and
examples, as will other resources, as well as explicit instructions for
automated tooling to take care of this with no human changes needed.
More complex cases where license metadata is currently specified may
need a bit of human intervention, but in most cases tools will be able
to provide a list of options following the mappings in this PEP, and
these are typically the projects most likely to be constrained by the
limitations of the existing license metadata, and thus most benefited
by the new fields in this PEP.

Finally, for unmaintained packages, those using tools supporting older
metadata versions, or those who choose not to provide license metadata,
no changes are required regardless of the deprecation.


Don't mandate validating new fields on PyPI
'''''''''''''''''''''''''''''''''''''''''''

Previously, while this PEP did include normative guidelines for packaging
publishing tools (such as Twine), it did not provide specific guidance
for PyPI (or other package indices) as to whether and how they
should validate the ``License-Expression`` or ``License-File`` fields,
nor how they should handle using them in combination with the deprecated
``License`` field or license classifiers. This simplifies the specification
and either defers implementation on PyPI to a later PEP, or gives
discretion to PyPI to enforce the stated invariants, to minimize
disruption to package authors.

However, this had been left unstated from before the ``License-Expression``
field was separate from the existing ``License``, which would make
validation much more challenging and backwards-incompatible, breaking
existing packages. With that change, there was a clear consensus that
the new field should be validated from the start, guaranteeing that all
distributions uploaded to PyPI that declare core metadata version 2.4
or higher and have the ``License-Expression`` field will have a valid
expression, such that PyPI and consumers of its packages and metadata
can rely upon to follow the specification here.

The same can be extended to the new ``License-File`` field as well,
to ensure that it is valid and the legally required license files are
present, and thus it is lawful for PyPI, users and downstream consumers
to distribute the package. (Of course, this makes no *guarantee* of such
as it is ultimately reliant on authors to declare them, but it improves
assurance of this and allows doing so in the future if the community so
decides.) To be clear, this would not require that any uploaded distribution
have such metadata, only that if they choose to declare it per the new
specification in this PEP, it is assured to be valid.


Source metadata ``license`` key
-------------------------------

Alternate possibilities related to the ``license`` key in the
``pyproject.toml`` project source metadata.


Add ``expression`` and ``files`` subkeys to table
'''''''''''''''''''''''''''''''''''''''''''''''''

A previous working draft of this PEP added ``expression`` and ``files`` subkeys
to the existing ``license`` table in the project source metadata, to parallel
the existing ``file`` and ``text`` subkeys. While this seemed perhaps the
most obvious approach at first glance, it had several serious drawbacks
relative to that ultimately taken here.

Most saliently, this means two very different types of metadata are being
specified under the same top-level key that require very different handling,
and furthermore, unlike the previous arrangement, the subkeys were not mutually
exclusive and can both be specified at once, and with some subkeys potentially
being dynamic and others static, and mapping to different core metadata fields.

Furthermore, this leads to a conflict with marking the key as ``dynamic``
(assuming that is intended to specify the ``[project]`` table keys,
as that PEP seems to imprecisely imply,
rather than core metadata fields), as either or both would have
to be treated as ``dynamic``.
Grouping both license expressions and license files under the same key
forces an "all or nothing" approach, and creates ambiguity as to user intent.

There are further downsides to this as well. Both users and tools would need to
keep track of which fields are mutually exclusive with which of the others,
greatly increasing cognitive and code complexity, and in turn the probability
of errors. Conceptually, juxtaposing so many different fields under the
same key is rather jarring, and leads to a much more complex mapping between
``[project]`` keys and core metadata fields, not in keeping with :pep:`621`.
This causes the ``[project]`` table naming and structure to diverge further
from both the core metadata and native formats of the various popular packaging
tools that use it. Finally, this results in the spec being significantly more
complex and convoluted to understand and implement than the alternatives.

The approach this PEP now takes, using the reserved top-level string value
of the ``license`` key, adding a new ``license-files`` key
and deprecating the ``license`` table subkeys (``text`` and ``file``),
avoids most of the issues identified above,
and results in a much clearer and cleaner design overall.
It allows ``license`` and ``license-files`` to be tagged
``dynamic`` independently, separates two independent types of metadata
(syntactically and semantically), restores a closer to 1:1 mapping of
``[project]`` table keys to core metadata fields,
and reduces nesting by a level for both.
Other than adding one extra key to the file, there was no significant
apparent downside to this latter approach, so it was adopted for this PEP.


Add an ``expression`` subkey instead of a string value
''''''''''''''''''''''''''''''''''''''''''''''''''''''

Adding just an ``expression`` subkey to the ``license`` table,
instead of using the reserved top-level string value,
would be more explicit for readers and writers,
in line with this PEP's goals.
However, it still has the downsides listed above
that are not specific to the inclusion of the ``files`` key.

Relative to a flat string value,
it adds verbosity, complexity and an extra level of nesting,
and requires users and tools to remember and handle
the mutual exclusivity of the subkeys
and remember which are deprecated and which are not,
instead of cleanly deprecating the table subkeys as a whole.
Furthermore, it is less clearly the "default" choice for modern use,
given users tend to gravitate toward the simplest and most obvious option.
Finally, it seems reasonable to follow the suggested guidance in :pep:`621`,
given the top-level string value was specifically reserved for this purpose.


Define a new top-level ``license-expression`` key
'''''''''''''''''''''''''''''''''''''''''''''''''

An earlier version of this PEP defined a new, top-level ``license-expression``
under the ``[project]`` table,
rather than using the reserved string value of the ``license`` key.
This was seen as clearer and more explicit for readers and writers,
in line with the goals of this PEP.

Additionally, while differences from existing tool formats (and core metadata
field names) have precedent in :pep:`621`,
using a key with an identical name as in most/all current tools
to mean something different (and map to a different core metadata field),
with distinct and incompatible syntax and semantics, does not,
and could cause confusion and ambiguity for readers and authors.

Also, per the `project source metadata spec <pep621specdynamic_>`__,
this would allow separately marking the ``[project]`` keys
corresponding to the ``License`` and ``License-Expression`` metadata fields
as ``dynamic``,
avoiding a potential concern with back-filling the ``License`` field
from the ``License-Expression`` field as this PEP currently allows
without it as ``license`` as dynamic
(which would not be possible, since they both map to the same top-level key).

However, community consensus favored using
the top-level string value of the existing ``license`` key,
as :pep:`reserved for this purpose by PEP 621 <621#license>`:

    A practical string value for the license key has been purposefully left
    out to allow for a future PEP to specify support for SPDX expressions
    (the same logic applies to any sort of "type" field specifying what
    license the file or text represents).

This is shorter and simpler for users to remember and type,
avoids adding a new top-level key while taking advantage of an existing one,
guides users toward using a license expression as the default,
and follows what was envisioned in the original :pep:`621`.

Additionally, this allows cleanly deprecating the table values
without deprecating the key itself,
and makes them inherently mutually exclusive without users having to remember
and tools having to enforce it.

Finally, consistency with other tool formats and the underlying core metadata
was not considered a sufficient priority
to override the advantages of using the existing key,
and the ``dynamic`` concerns were mostly mitigated by
not specifying legacy license to license expression conversion at build time,
explicitly specifying backfilling the ``License`` field when not ``dynamic``,
and the fact that both fields are mutually exclusive,
so there is little practical need to distinguish which is dynamic.

Therefore, a top-level string value for ``license`` was adopted for this PEP,
as an earlier working draft had temporarily specified.


Add a ``type`` key to treat ``text`` as expression
''''''''''''''''''''''''''''''''''''''''''''''''''

Instead of using the reserved top-level string value
of the ``license`` key in the ``[project]`` table,
one could add a ``type`` subkey to the ``license`` table
to control whether ``text`` (or a string value)
is interpreted as free-text or a license expression. This could make
backward compatibility a little more seamless, as older tools could ignore
it and always treat ``text`` as ``license``, while newer tools would
know to treat it as a license expression, if ``type`` was set appropriately.
Indeed, :pep:`621` seems to suggest something of this sort as a possible
alternative way that SPDX license expressions could be implemented.

However, all the same downsides as in the previous item apply here,
including greater complexity, a more complex mapping between the project
source metadata and core metadata and inconsistency between the presentation
in tool config, project source metadata and core metadata,
a much less clean deprecation, further bikeshedding over what to name it,
and inability to mark one but not the other as dynamic, among others.

In addition, while theoretically potentially a little easier in the short
term, in the long term it would mean users would always have to remember
to specify the correct ``type`` to ensure their license expression is
interpreted correctly, which adds work and potential for error; we could
never safety change the default while being confident that users
understand that what they are entering is unambiguously a license expression,
with all the false positive and false negative issues as above.

Therefore, for these as well as the same reasons this approach was rejected
for the core metadata in favor of a distinct ``License-Expression`` field,
we similarly reject this here in favor of
the reserved string value of the ``license`` key.


Must be marked dynamic to back-fill
'''''''''''''''''''''''''''''''''''

The ``license`` key in the ``pyproject.toml`` could be required to be
explicitly set to dynamic in order for the ``License`` core metadata field
to be automatically back-filled from
the top-level string value of the ``license`` key.
This would be more explicit that the filling will be done,
as strictly speaking the ``license`` key is not (and cannot be) specified in
``pyproject.toml``, and satisfies a stricter interpretation of the letter
of the previous :pep:`621` specification that this PEP revises.

However, this doesn't seem to be necessary, because it is simply using the
static, verbatim literal value of the ``license`` key, as specified
strictly in this PEP. Therefore, any conforming tool can trivially,
deterministically and unambiguously derive this using only the static data
in the ``pyproject.toml`` file itself.

Furthermore, this actually adds significant ambiguity, as it means the value
could get filled arbitrarily by other tools, which would in turn compromise
and conflict with the value of the new ``License-Expression`` field, which is
why such is explicitly prohibited by this PEP. Therefore, not marking it as
``dynamic`` will ensure it is only handled in accordance with this PEP's
requirements.

Finally, users explicitly being told to mark it as ``dynamic``, or not, to
control filling behavior seems to be a bit of a mis-use of the ``dynamic``
field as apparently intended, and prevents tools from adapting to best
practices (fill, don't fill, etc) as they develop and evolve over time.


Source metadata ``license-files`` key
-------------------------------------

Alternatives considered for the ``license-files`` key in the
``pyproject.toml`` ``[project]`` table, primarily related to the
path/glob type handling.


Add a ``type`` subkey to ``license-files``
''''''''''''''''''''''''''''''''''''''''''

Instead of defining mutually exclusive ``paths`` and ``globs`` subkeys
of the ``license-files`` ``[project]`` table key, we could
achieve the same effect with a ``files`` subkey for the list and
a ``type`` subkey for how to interpret it. However, the latter offers no
real advantage over the former, in exchange for requiring more keystrokes,
verbosity and complexity, as well as less flexibility in allowing both,
or another additional subkey in the future, as well as the need to bikeshed
over the subkey name. Therefore, it was summarily rejected.


Only accept verbatim paths
''''''''''''''''''''''''''

Globs could be disallowed completely as values to the ``license-files``
key in ``pyproject.toml`` and only verbatim literal paths allowed.
This would ensure that all license files are explicitly specified, all
specified license files are found and included, and the source metadata
is completely static in the strictest sense of the term, without tools
having to inspect the rest of the project source files to determine exactly
what license files will be included and what the ``License-File`` values
will be. This would also modestly simplify the spec and tool implementation.

However, practicality once again beats purity here. Globs are supported and
used by many existing tools for finding license files, and explicitly
specifying the full path to every license file would be unnecessarily tedious
for more complex projects with vendored code and dependencies. More
critically, it would make it much easier to accidentally miss a required
legal file, silently rendering the package illegal to distribute.

Tools can still statically and consistently determine the files to be included,
based only on those glob patterns the user explicitly specified and the
filenames in the package, without installing it, executing its code or even
examining its files. Furthermore, tools are still explicitly allowed to warn
if specified glob patterns (including full paths) don't match any files.
And, of course, sdists, wheels and others will have the full static list
of files specified in their distribution metadata.

Perhaps most importantly, this would also preclude the currently specified
default value, as widely used by the current most popular tools, and thus
be a major break to backward compatibility, tool consistency, and safe
and sane default functionality to avoid unintentional license violations.
And of course, authors are welcome and encouraged to specify their license
files explicitly via the ``paths`` table subkey, once they are aware of it and
if it is suitable for their project and workflow.


Only accept glob patterns
'''''''''''''''''''''''''

Conversely, all ``license-files`` strings could be treated as glob patterns.
This would slightly simplify the spec and implementation, avoid an extra level
of nesting, and more closely match the configuration format of existing tools.

However, for the cost of a few characters, it ensures users are aware
whether they are entering globs or verbatim paths. Furthermore, allowing
license files to be specified as literal paths avoids edge cases, such as those
containing glob characters (or those confusingly or even maliciously similar
to them, as described in :pep:`672`).

Including an explicit ``paths`` value ensures that the resulting
``License-File`` metadata is correct, complete and purely static in the
strictest sense of the term, with all license paths explicitly specified
in the ``pyproject.toml`` file, guaranteed to be included and with an early
error should any be missing. This is not practical to do, at least without
serious limitations for many workflows, if we must assume the items
are glob patterns rather than literal paths.

This allows tools to locate them and know the exact values of the
``License-File`` core metadata fields without having to traverse the
source tree of the project and match globs, potentially allowing easier,
more efficient and reliable programmatic inspection and processing.

Therefore, given the relatively small cost and the significant benefits,
this approach was not adopted.


Infer whether paths or globs
''''''''''''''''''''''''''''

It was considered whether to simply allow specifying an array of strings
directly for the ``license-files`` key, rather than making it a table with
explicit ``paths`` and ``globs``. This would be somewhat simpler and avoid
an extra level of nesting, and more closely match the configuration format
of existing tools. However, it was ultimately rejected in favor of separate,
mutually exclusive ``paths`` and ``globs`` table subkeys.

In practice, it only saves six extra characters in the ``pyproject.toml``
(``license-files = [...]`` vs ``license-files.globs = [...]``), but allows
the user to more explicitly declare their intent, ensures they understand how
the values are going to be interpreted, and serves as an unambiguous indicator
for tools to parse them as globs rather than verbatim path literals.

This, in turn, allows for more appropriate, clearly specified tool
behaviors for each case, many of which would be unreliable or impossible
without it, to avoid common traps, provide more helpful feedback and
behave more sensibly and intuitively overall. These include, with ``paths``,
guaranteeing that each and every specified file is included and immediately
raising an error if one is missing, and with ``globs``, checking glob syntax,
excluding unwanted backup, temporary, or other such files (as current tools
already do), and optionally warning if a glob doesn't match any files.
This also avoids edge cases (e.g. paths that contain glob characters) and
reliance on heuristics to determine interpretation—the very thing this PEP
seeks to avoid.


.. _639-license-files-allow-flat-array:

Also allow a flat array value
'''''''''''''''''''''''''''''

Initially, after deciding to define ``license-files`` as a table of ``paths``
and ``globs``, thought was given to making a top-level string array under the
``license-files`` key mean one or the other (probably ``globs``, to match most
current tools). This is slightly shorter and simpler, would allow gently
nudging users toward a preferred one, and allow a slightly cleaner handling of
the empty case (which, at present, is treated identically for either).

However, this again only saves six characters in the best case, and there
isn't an obvious choice; whether from a perspective of preference (both had
clear use cases and benefits), nor as to which one users would naturally
assume.

Flat may be better than nested, but in the face of ambiguity, users
may not resist the temptation to guess. Requiring users to explicitly specify
one or the other ensures they are aware of how their inputs will be handled,
and is more readable for others, both human and machine alike. It also makes
the spec and tool implementation slightly more complicated, and it can always
be added in the future, but not removed without breaking backward
compatibility. And finally, for the "preferred" option, it means there is
more than one obvious way to do it.

Therefore, per :pep:`20`, the Zen of Python, this approach is hereby rejected.


Allow both ``paths`` and ``globs`` subkeys
''''''''''''''''''''''''''''''''''''''''''

Allowing both ``paths`` and ``globs`` subkeys to be specified under the
``license-files`` table was considered, as it could potentially allow
more flexible handling for particularly complex projects, and specify on a
per-pattern rather than overall basis whether ``license-files`` entries
should be treated as ``paths`` or ``globs``.

However, given the existing proposed approach already matches or exceeds the
power and capabilities of those offered in tools' config files, there isn't
clear demand for this and few likely cases that would benefit, it adds a large
amount of complexity for relatively minimal gain, in terms of the
specification, in tool implementations and in ``pyproject.toml`` itself.

There would be many more edge cases to deal with, such as how to handle files
matched by both lists, and it conflicts in multiple places with the current
specification for how tools should behave with one or the other, such as when
no files match, guarantees of all files being included and of the file paths
being explicitly, statically specified, and others.

Like the previous, if there is a clear need for it, it can be always allowed
in the future in a backward-compatible manner (to the extent it is possible
in the first place), while the same is not true of disallowing it.
Therefore, it was decided to require the two subkeys to be mutually exclusive.


Rename ``paths`` subkey to ``files``
''''''''''''''''''''''''''''''''''''

Initially, it was considered whether to name the ``paths`` subkey of the
``license-files`` table ``files`` instead. However, ``paths`` was ultimately
chosen, as calling the table subkey ``files`` resulted in duplication between
the table name (``license-files``) and the subkey name (``files``), i.e.
``license-files.files = ["LICENSE.txt"]``, made it seem like the preferred/
default subkey when it was not, and lacked the same parallelism with ``globs``
in describing the format of the string entry rather than what was being
pointed to.


Must be marked dynamic to use defaults
''''''''''''''''''''''''''''''''''''''

It may seem outwardly sensible, at least with a particularly restrictive
interpretation of :pep:`621`'s description of the ``dynamic`` list, to
consider requiring the ``license-files`` key to be explicitly marked as
``dynamic`` in order for the default glob patterns to be used, or alternatively
for license files to be matched and included at all.

However, this is merely declaring a static, strictly-specified default value
for this particular key, required to be used exactly by all conforming tools
(so long as it is not marked ``dynamic``, negating this argument entirely),
and is no less static than any other set of glob patterns the user themself
may specify. Furthermore, the resulting ``License-File`` core metadata values
can still be determined with only a list of files in the source, without
installing or executing any of the code, or even inspecting file contents.

Moreover, even if this were not so, practicality would trump purity, as this
interpretation would be strictly backwards-incompatible with the existing
format, and be inconsistent with the behavior with the existing tools.
Further, this would create a very serious and likely risk of a large number of
projects unknowingly no longer including legally mandatory license files,
making their distribution technically illegal, and is thus not a sane,
much less sensible default.

Finally, aside from adding an additional line of default-required boilerplate
to the file, not defining the default as dynamic allows authors to clearly
and unambiguously indicate when their build/packaging tools are going to be
handling the inclusion of license files themselves rather than strictly
conforming to the project source metadata portions of this PEP;
to do otherwise would defeat the primary purpose of the ``dynamic`` list
as a marker and escape hatch.


License file paths
------------------

Alternatives related to the paths and locations of license files in the source
and built distributions.


Flatten license files in subdirectories
'''''''''''''''''''''''''''''''''''''''

Previous drafts of this PEP were silent on the issue of handling license files
in subdirectories. Currently, the `Wheel <wheelfiles_>`__ and (following its
example) `Setuptools <setuptoolsfiles_>`__ projects flatten all license files
into the ``.dist-info`` directory without preserving the source subdirectory
hierarchy.

While this is the simplest approach and matches existing ad hoc practice,
this can result in name conflicts and license files clobbering others,
with no obvious defined behavior for how to resolve them, and leaving the
package legally un-distributable without any clear indication to users that
their specified license files have not been included.

Furthermore, this leads to inconsistent relative file paths for non-root
license files between the source, sdist and wheel, and prevents the paths
given in the "static" ``[project]`` table metadata from being truly static,
as they need to be flattened, and may potentially overwrite one another.
Finally, the source directory structure often implies valuable information
about what the licenses apply to, and where to find them in the source,
which is lost when flattening them and far from trivial to reconstruct.

To resolve this, the PEP now proposes, as did contributors on both of the
above issues, reproducing the source directory structure of the original
license files inside the ``.dist-info`` directory. This would fully resolve the
concerns above, with the only downside being a more nested ``.dist-info``
directory. There is still a risk of collision with edge-case custom
filenames (e.g. ``RECORD``, ``METADATA``), but that is also the case
with the previous approach, and in fact with fewer files flattened
into the root, this would actually reduce the risk. Furthermore,
the following proposal rooting the license files under a ``licenses``
subdirectory eliminates both collisions and the clutter problem entirely.


Resolve name conflicts differently
''''''''''''''''''''''''''''''''''

Rather than preserving the source directory structure for license files
inside the ``.dist-info`` directory, we could specify some other mechanism
for conflict resolution, such as pre- or appending the parent directory name
to the license filename, traversing up the tree until the name was unique,
to avoid excessively nested directories.

However, this would not address the path consistency issues, would require
much more discussion, coordination and bikeshedding, and further complicate
the specification and the implementations. Therefore, it was rejected in
favor of the simpler and more obvious solution of just preserving the
source subdirectory layout, as many stakeholders have already advocated for.


Dump directly in ``.dist-info``
'''''''''''''''''''''''''''''''

Previously, the included license files were stored directly in the top-level
``.dist-info`` directory of built wheels and installed projects. This followed
existing ad hoc practice, ensured most existing wheels currently using this
feature will match new ones, and kept the specification simpler, with the
license files always being stored in the same location relative to the core
metadata regardless of distribution type.

However, this leads to a more cluttered ``.dist-info`` directory, littered
with arbitrary license files and subdirectories, as opposed to separating
licenses into their own namespace (which per the Zen of Python, :pep:`20`, are
"one honking great idea"). While currently small, there is still a
risk of collision with specific custom license filenames
(e.g. ``RECORD``, ``METADATA``) in the ``.dist-info`` directory, which
would only increase if and when additional files were specified here, and
would require carefully limiting the potential filenames used to avoid
likely conflicts with those of license-related files. Finally,
putting licenses into their own specified subdirectory would allow
humans and tools to quickly, easily and correctly list, copy and manipulate
all of them at once (such as in distro packaging, legal checks, etc)
without having to reference each of their paths from the core metadata.

Therefore, now is a prudent time to specify an alternate approach.
The simplest and most obvious solution, as suggested by several on the Wheel
and Setuptools implementation issues, is to simply root the license files
relative to a ``licenses`` subdirectory of ``.dist-info``. This is simple
to implement and solves all the problems noted here, without clear significant
drawbacks relative to other more complex options.

It does make the specification a bit more complex and less elegant, but
implementation should remain equally simple. It does mean that wheels
produced with following this change will have differently-located licenses
than those prior, but as this was already true for those in subdirectories,
and until this PEP there was no way of discovering these files or
accessing them programmatically, this doesn't seem likely to pose
significant problems in practice. Given this will be much harder if not
impossible to change later, once the status quo is standardized, tools are
relying on the current behavior and there is much greater uptake of not
only simply including license files but potentially accessing them as well
using the core metadata, if we're going to change it, now would be the time
(particularly since we're already introducing an edge-case change with how
license files in subdirs are handled, along with other refinements).

Therefore, the latter has been incorporated into current drafts of this PEP.


Add new ``licenses`` category to wheel
''''''''''''''''''''''''''''''''''''''

Instead of defining a root license directory (``licenses``) inside
the core metadata directory (``.dist-info``) for wheels, we could instead
define a new category (and, presumably, a corresponding install scheme),
similar to the others currently included under ``.data`` in the wheel archive,
specifically for license files, called (e.g.) ``licenses``. This was mentioned
by the wheel creator, and would allow installing licenses somewhere more
platform-appropriate and flexible than just the ``.dist-info`` directory
in the site path, and potentially be conceptually cleaner than including
them there.

However, at present, this PEP does not implement this idea, and it is
deferred to a future one. It would add significant complexity and friction
to this PEP, being primarily concerned with standardizing existing practice
and updating the core metadata specification. Furthermore, doing so would
likely require modifying ``sysconfig`` and the install schemes specified
therein, alongside Wheel, Installer and other tools, which would be a
non-trivial undertaking. While potentially slightly more complex for
repackagers (such as those for Linux distributions), the current proposal still
ensures all license files are included, and in a single dedicated directory
(which can easily be copied or relocated downstream), and thus should still
greatly improve the status quo in this regard without the attendant complexity.

In addition, this approach is not fully backwards compatible (since it
isn't transparent to tools that simply extract the wheel), is a greater
departure from existing practice and would lead to more inconsistent
license install locations from wheels of different versions. Finally,
this would mean licenses would not be installed as proximately to their
associated code, there would be more variability in the license root path
across platforms and between built distributions and installed projects,
accessing installed licenses programmatically would be more difficult, and a
suitable install location and method would need to be created, discussed
and decided that would avoid name clashes.

Therefore, to keep this PEP in scope, the current approach was retained.


Name the subdirectory ``license_files``
'''''''''''''''''''''''''''''''''''''''

Both ``licenses`` and ``license_files`` have been suggested as potential
names for the root license directory inside ``.dist-info`` of wheels and
installed projects. An initial draft of the PEP specified the former
due to being slightly clearer and consistent with the
name of the core metadata field (``License-File``)
and the ``[project]`` table key (``license-files``).
However, the current version of the PEP adopts the ``license`` name,
due to a general preference by the community for its shorter length,
greater simplicity and the lack of a separator character (``_``, ``-``, etc.).


Other ideas
-----------

Miscellaneous proposals, possibilities and discussion points that were
ultimately not adopted.


Map identifiers to license files
''''''''''''''''''''''''''''''''

This would require using a mapping (as two parallel lists would be too prone to
alignment errors), which would add extra complexity to how license
are documented and add an additional nesting level.

A mapping would be needed, as it cannot be guaranteed that all expressions
(keys) have a single license file associated with them (e.g.
GPL with an exception may be in a single file) and that any expression
does not have more than one. (e.g. an Apache license ``LICENSE`` and
its ``NOTICE`` file, for instance, are two distinct files).
For most common cases, a single license expression and one or more license
files would be perfectly adequate. In the rarer and more complex cases where
there are many licenses involved, authors can still safety use the fields
specified here, just with a slight loss of clarity by not specifying which
text file(s) map to which license identifier (though this should be clear in
practice given each license identifier has corresponding SPDX-registered
full license text), while not forcing the more complex data model
(a mapping) on the large majority of users who do not need or want it.

We could of course have a data field with multiple possible value types (it's a
string, it's a list, it's a mapping!) but this could be a source of confusion.
This is what has been done, for instance, in npm (historically) and in Rubygems
(still today), and as result tools need to test the type of the metadata field
before using it in code, while users are confused about when to use a list or a
string. Therefore, this approach is rejected.


Map identifiers to source files
'''''''''''''''''''''''''''''''

As discussed previously, file-level notices are out of scope for this PEP,
and the existing ``SPDX-License-Identifier`` `convention <spdxid_>`__ can
already be used if this is needed without further specification here.


Don't freeze compatibility with a specific SPDX version
'''''''''''''''''''''''''''''''''''''''''''''''''''''''

This PEP could omit specifying a specific SPDX specification version,
or one for the list of valid license identifiers, which would allow
more flexible updates as the specification evolves without another
PEP or equivalent.

However, serious concerns were expressed about a future SPDX update breaking
compatibility with existing expressions and identifiers, leaving current
packages with invalid metadata per the definition in this PEP. Requiring
compatibility with a specific version of these specifications here
and a PEP or similar process to update it avoids this contingency,
and follows the practice of other packaging ecosystems.

Therefore, it was `decided <spdxversion_>`__ to specify a minimum version
and requires tools to be compatible with it, while still allowing updates
so long as they don't break backward compatibility. This enables
tools to immediate take advantage of improvements and accept new
licenses, but also remain backwards compatible with the version
specified here, balancing flexibility and compatibility.


.. _639-rejected-ideas-difference-license-source-binary:

Different licenses for source and binary distributions
''''''''''''''''''''''''''''''''''''''''''''''''''''''

As an additional use case, it was asked whether it was in scope for this
PEP to handle cases where the license expression for a binary distribution
(wheel) is different from that for a source distribution (sdist), such
as in cases of non-pure-Python packages that compile and bundle binaries
under different licenses than the project itself. An example cited was
`PyTorch <pytorch_>`__, which contains CUDA from Nvidia, which is freely
distributable but not open source. `NumPy <numpyissue_>`__ and
`SciPy <scipyissue_>`__ also had similar issues, as reported by the
original author of this PEP and now resolved for those cases.

However, given the inherent complexity here and a lack of an obvious
mechanism to do so, the fact that each wheel would need its own license
information, lack of support on PyPI for exposing license info on a
per-distribution archive basis, and the relatively niche use case, it was
determined to be out of scope for this PEP, and left to a future PEP
to resolve if sufficient need and interest exists and an appropriate
mechanism can be found.


Open Issues
===========

Should the ``License`` field be back-filled, or mutually exclusive?
-------------------------------------------------------------------

At present, this PEP explicitly allows, but does not formally recommend or
require, build tools to back-fill the ``License`` core metadata field with
the verbatim text from the ``License-Expression`` field. This would
presumably improve backwards compatibility and was suggested
by some on the Discourse thread. On the other hand, allowing it does
increase complexity and is less of a clean, consistent separation,
preventing the ``License`` field from being completely mutually exclusive
with the new ``License-Expression`` field and requiring that their values
match.

As such, it would be very useful to have a more concrete and specific
rationale and use cases for the back-filled data, and give fuller
consideration to any potential benefits or drawbacks of this approach,
in order to come to a final consensus on this matter that can be appropriately
justified here.

Therefore, is the status quo expressed here acceptable, allowing tools
leeway to decide this for themselves? Should this PEP formally recommend,
or even require, that tools back-fill this metadata (which would presumably
be reversed once a breaking revision of the metadata spec is issued)?
Or should this not be explicitly allowed, discouraged or even prohibited?


Should custom license identifiers be allowed?
---------------------------------------------

The current version of this PEP retains the behavior of only specifying
the use of SPDX-defined license identifiers, as well as the explicitly defined
custom identifiers ``LicenseRef-Public-Domain`` and ``LicenseRef-Proprietary``
to handle the two common cases where projects have a license, but it is not
one that has a recognized SPDX license identifier.

For maximum flexibility, custom ``LicenseRef-<CUSTOM-TEXT>`` license
identifiers could be allowed, which could potentially be useful for niche
cases or corporate environments where ``LicenseRef-Proprietary`` is not
appropriate or insufficiently specific, but relying on mainstream Python
build tooling and the ``License-Expression`` metadata field is still
desirable to use for this purpose.

This has the downsides, however, of not catching misspellings of the
canonically defined license identifiers and thus producing license metadata
that is not a valid match for what the author intended, as well as users
potentially thinking they have to prepend ``LicenseRef`` in front of valid
license identifiers, as there seems to be some previous confusion about.
Furthermore, this encourages the proliferation of bespoke license identifiers,
which obviates the purpose of enabling clear, unambiguous and well
understood license metadata for which this PEP was created.

Indeed, for niche cases that need specific, proprietary custom licenses,
they could always simply specify ``LicenseRef-Proprietary``, and then
include the actual license files needed to unambiguously identify the license
regardless (if not using SPDX license identifiers) under the ``License-File``
fields. Requiring standards-conforming tools to allow custom license
identifiers does not seem very useful, since standard tools will not recognize
bespoke ones or know how to treat them. By contrast, bespoke tools, which
would be required in any case to understand and act on custom identifiers,
are explicitly allowed, with good reason (thus the ``SHOULD`` keyword)
to not require that license identifiers conform to those listed here.
Therefore, this specification still allows such use in private corporate
environments or specific ecosystems, while avoiding the disadvantages of
imposing them on all mainstream packaging tools.

As an alternative, a literal ``LicenseRef-Custom`` identifier could be
defined, which would more explicitly indicate that the license cannot be
expressed with defined identifiers and the license text should be referenced
for details, without carrying the negative and potentially inappropriate
implications of ``LicenseRef-Proprietary``. This would avoid the main
mentioned downsides (misspellings, confusion, license proliferation) of
the approve approach of allowing an arbitrary ``LicenseRef``, while
addressing several of the potential theoretical scenarios cited for it.

On the other hand, as SPDX aims to (and generally does) encompass all
FSF-recognized "Free" and OSI-approved "Open Source" licenses,
and those sources are kept closely in sync and are now relatively stable,
anything outside those bounds would generally be covered by
``LicenseRef-Proprietary``, thus making ``LicenseRef-Custom`` less specific
in that regard, and somewhat redundant to it. Furthermore, it may mislead
authors of projects with complex/multiple licenses that they should use it
over specifying a license expression.

At present, the PEP retains the existing approach over either of these, given
the use cases and benefits were judged to be sufficiently marginal based
on the current understanding of the packaging landscape. For both these
proposals, however, if more concrete use cases emerge, this can certainly
be reconsidered, either for this current PEP or a future one (before or
in tandem with actually removing the legacy unstructured ``License``
metadata field). Not defining this now enables allowing it later
(or still now, with custom packaging tools), without affecting backward
compatibility, while the same is not so if they are allowed now and later
determined to be unnecessary or too problematic in practice.


.. _639-examples:

Appendix: Examples
==================

.. _639-example-basic:

Basic example
-------------

The Setuptools project itself, as of `version 59.1.1 <setuptools5911_>`__,
does not use the ``License`` field in its own project source metadata.
Further, it no longer explicitly specifies ``license_file``/``license_files``
as it did previously, since Setuptools relies on its own automatic
inclusion of license-related files matching common patterns,
such as the ``LICENSE`` file it uses.

It includes the following license-related metadata in its ``setup.cfg``:

.. code-block:: ini

    [metadata]
    classifiers =
        License :: OSI Approved :: MIT License

The simplest migration to this PEP would consist of using this instead:

.. code-block:: ini

    [metadata]
    license_expression = MIT

Or, in the ``[project]`` table of ``pyproject.toml``:

.. code-block:: toml

    [project]
    license = "MIT"

The output core metadata for the distribution packages would then be:

.. code-block:: email

    License-Expression: MIT
    License-File: LICENSE

The ``LICENSE`` file would be stored at ``/setuptools-${VERSION}/LICENSE``
in the sdist and ``/setuptools-${VERSION}.dist-info/licenses/LICENSE``
in the wheel, and unpacked from there into the site directory (e.g.
``site-packages``) on installation; ``/`` is the root of the respective archive
and ``${VERSION}`` the version of the Setuptools release in the core metadata.


.. _639-example-advanced:

Advanced example
----------------

Suppose Setuptools were to include the licenses of the third-party projects
that are vendored in the ``setuptools/_vendor/`` and ``pkg_resources/_vendor``
directories; specifically:

.. code-block:: text

    packaging==21.2
    pyparsing==2.2.1
    ordered-set==3.1.1
    more_itertools==8.8.0

The license expressions for these projects are:

.. code-block:: text

    packaging: Apache-2.0 OR BSD-2-Clause
    pyparsing: MIT
    ordered-set: MIT
    more_itertools: MIT

A comprehensive license expression covering both Setuptools
proper and its vendored dependencies would contain these metadata,
combining all the license expressions into one. Such an expression might be:

.. code-block:: text

    MIT AND (Apache-2.0 OR BSD-2-Clause)

In addition, per the requirements of the licenses, the relevant license files
must be included in the package. Suppose the ``LICENSE`` file contains the text
of the MIT license and the copyrights used by Setuptools, ``pyparsing``,
``more_itertools`` and ``ordered-set``; and the ``LICENSE*`` files in the
``setuptools/_vendor/packaging/`` directory contain the Apache 2.0 and
2-clause BSD license text, and the Packaging copyright statement and
`license choice notice <packaginglicense_>`__.

Specifically, we assume the license files are located at the following
paths in the project source tree (relative to the project root and
``pyproject.toml``):

.. code-block:: ini

    LICENSE
    setuptools/_vendor/packaging/LICENSE
    setuptools/_vendor/packaging/LICENSE.APACHE
    setuptools/_vendor/packaging/LICENSE.BSD

Putting it all together, our ``setup.cfg`` would be:

.. code-block:: ini

    [metadata]
    license_expression = MIT AND (Apache-2.0 OR BSD-2-Clause)
    license_files =
        LICENSE
        setuptools/_vendor/packaging/LICENSE
        setuptools/_vendor/packaging/LICENSE.APACHE
        setuptools/_vendor/packaging/LICENSE.BSD

In the ``[project]`` table of ``pyproject.toml``, with license files
specified explicitly via the ``paths`` subkey, this would look like:

.. code-block:: toml

    [project]
    license = "MIT AND (Apache-2.0 OR BSD-2-Clause)"
    license-files.paths = [
        "LICENSE",
        "setuptools/_vendor/LICENSE",
        "setuptools/_vendor/LICENSE.APACHE",
        "setuptools/_vendor/LICENSE.BSD",
    ]

Or alternatively, matched via glob patterns, this could be:

.. code-block:: toml

    [project]
    license = "MIT AND (Apache-2.0 OR BSD-2-Clause)"
    license-files.globs = [
        "LICENSE*",
        "setuptools/_vendor/LICENSE*",
    ]

With either approach, the output core metadata in the distribution
would be:

.. code-block:: email

    License-Expression: MIT AND (Apache-2.0 OR BSD-2-Clause)
    License-File: LICENSE
    License-File: setuptools/_vendor/packaging/LICENSE
    License-File: setuptools/_vendor/packaging/LICENSE.APACHE
    License-File: setuptools/_vendor/packaging/LICENSE.BSD

In the resulting sdist, with ``/`` as the root of the archive and ``${VERSION}``
the version of the Setuptools release specified in the core metadata,
the license files would be located at the paths:

.. code-block:: shell

    /setuptools-${VERSION}/LICENSE
    /setuptools-${VERSION}/setuptools/_vendor/packaging/LICENSE
    /setuptools-${VERSION}/setuptools/_vendor/packaging/LICENSE.APACHE
    /setuptools-${VERSION}/setuptools/_vendor/packaging/LICENSE.BSD

In the built wheel, with ``/`` being the root of the archive and
``{version}`` as the previous, the license files would be stored at:

.. code-block:: shell

    /setuptools-${VERSION}.dist-info/licenses/LICENSE
    /setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE
    /setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE.APACHE
    /setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE.BSD

Finally, in the installed project, with ``site-packages`` being the site dir
and ``{version}`` as the previous, the license files would be installed to:

.. code-block:: shell

    site-packages/setuptools-${VERSION}.dist-info/licenses/LICENSE
    site-packages/setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE
    site-packages/setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE.APACHE
    site-packages/setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE.BSD


.. _639-example-expression:

Expression examples
-------------------

Some additional examples of valid ``License-Expression`` values:

.. code-block:: email

    License-Expression: MIT
    License-Expression: BSD-3-Clause
    License-Expression: MIT AND (Apache-2.0 OR BSD-2-clause)
    License-Expression: MIT OR GPL-2.0-or-later OR (FSFUL AND BSD-2-Clause)
    License-Expression: GPL-3.0-only WITH Classpath-Exception-2.0 OR BSD-3-Clause
    License-Expression: LicenseRef-Public-Domain OR CC0-1.0 OR Unlicense
    License-Expression: LicenseRef-Proprietary


.. _639-user-scenarios:

Appendix: User Scenarios
========================

The following covers the range of common use cases from a user perspective,
providing straightforward guidance for each. Do note that the following
should **not** be considered legal advice, and readers should consult a
licensed legal practitioner in their jurisdiction if they are unsure about
the specifics for their situation.


I have a private package that won't be distributed
--------------------------------------------------

If your package isn't shared publicly, i.e. outside your company,
organization or household, it *usually* isn't strictly necessary to include
a formal license, so you wouldn't necessarily have to do anything extra here.

However, it is still a good idea to include ``LicenseRef-Proprietary``
as a license expression in your package configuration, and/or a
copyright statement and any legal notices in a ``LICENSE.txt`` file
in the root of your project directory, which will be automatically
included by packaging tools.


I just want to share my own work without legal restrictions
-----------------------------------------------------------

While you aren't required to include a license, if you don't, no one has
`any permission to download, use or improve your work <dontchoosealicense_>`__,
so that's probably the *opposite* of what you actually want.
The `MIT license <mitlicense_>`__ is a great choice instead, as it's simple,
widely used and allows anyone to do whatever they want with your work
(other than sue you, which you probably also don't want).

To apply it, just paste `the text <chooseamitlicense_>`__ into a file named
``LICENSE.txt`` at the root of your repo, and add the year and your name to
the copyright line. Then, just add ``license = "MIT"`` under
``[project]`` in your ``pyproject.toml`` if your packaging tool supports it,
or in its config file/section (e.g. Setuptools ``license_expression = MIT``
under ``[metadata]`` in ``setup.cfg``). You're done!


I want to distribute my project under a specific license
--------------------------------------------------------

To use a particular license, simply paste its text into a ``LICENSE.txt``
file at the root of your repo, if you don't have it in a file starting with
``LICENSE`` or ``COPYING`` already, and add
``license = "LICENSE-ID"`` under ``[project]`` in your
``pyproject.toml`` if your packaging tool supports it, or else in its
config file (e.g. for Setuptools, ``license_expression = LICENSE-ID``
under ``[metadata]`` in ``setup.cfg``). You can find the ``LICENSE-ID``
and copyable license text on sites like
`ChooseALicense <choosealicenselist_>`__ or `SPDX <spdxlist_>`__.

Many popular code hosts, project templates and packaging tools can add the
license file for you, and may support the expression as well in the future.


I maintain an existing package that's already licensed
------------------------------------------------------

If you already have license files and metadata in your project, you
should only need to make a couple of tweaks to take advantage of the new
functionality.

In your project config file, enter your license expression under
``license`` (``[project]`` table in ``pyproject.toml``),
``license_expression`` (Setuptools ``setup.cfg`` / ``setup.py``),
or the equivalent for your packaging tool,
and make sure to remove any legacy ``license`` table subkeys or
``License ::`` classifiers. Your existing ``license`` value may already
be valid as one (e.g. ``MIT``, ``Apache-2.0 OR BSD-2-Clause``, etc);
otherwise, check the `SPDX license list <spdxlist_>`__ for the identifier
that matches the license used in your project.

If your license files begin with ``LICENSE``, ``COPYING``, ``NOTICE`` or
``AUTHORS``, or you've already configured your packaging tool to add them
(e.g. ``license_files`` in ``setup.cfg``), you should already be good to go.
If not, make sure to list them under ``license-files.paths``
or ``license-files.globs`` under ``[project]`` in ``pyproject.toml``
(if your tool supports it), or else in your tool's configuration file
(e.g. ``license_files`` in ``setup.cfg`` for Setuptools).

See the :ref:`639-example-basic` for a simple but complete real-world demo
of how this works in practice, including some additional technical details.
Packaging tools may support automatically converting legacy licensing
metadata; check your tool's documentation for more information.


My package includes other code under different licenses
-------------------------------------------------------

If your project includes code from others covered by different licenses,
such as vendored dependencies or files copied from other open source
software, you can construct a license expression (or have a tool
help you do so) to describe the licenses involved and the relationship
between them.

In short, ``License-1 AND License-2`` mean that *both* licenses apply
to your project, or parts of it (for example, you included a file
under another license), and ``License-1 OR License-2`` means that
*either* of the licenses can be used, at the user's option (for example,
you want to allow users a choice of multiple licenses). You can use
parenthesis (``()``) for grouping to form expressions that cover even the most
complex situations.

In your project config file, enter your license expression under
``license`` (``[project]`` table of ``pyproject.toml``),
``license_expression`` (Setuptools ``setup.cfg`` / ``setup.py``),
or the equivalent for your packaging tool,
and make sure to remove any legacy ``license`` table subkeys
or ``License ::`` classifiers.

Also, make sure you add the full license text of all the licenses as files
somewhere in your project repository. If all of them are in the root directory
and begin with ``LICENSE``, ``COPYING``, ``NOTICE`` or ``AUTHORS``,
they will be included automatically. Otherwise, you'll need to list the
relative path or glob patterns to each of them under ``license-files.paths``
or ``license-files.globs`` under ``[project]`` in ``pyproject.toml``
(if your tool supports it), or else in your tool's configuration file
(e.g. ``license_files`` in ``setup.cfg`` for Setuptools).

As an example, if your project was licensed MIT but incorporated
a vendored dependency (say, ``packaging``) that was licensed under
either Apache 2.0 or the 2-clause BSD, your license expression would
be ``MIT AND (Apache-2.0 OR BSD-2-Clause)``. You might have a
``LICENSE.txt`` in your repo root, and a ``LICENSE-APACHE.txt`` and
``LICENSE-BSD.txt`` in the ``_vendor`` subdirectory, so to include
all of them, you'd specify ``["LICENSE.txt", "_vendor/packaging/LICENSE*"]``
as glob patterns, or
``["LICENSE.txt", "_vendor/LICENSE-APACHE.txt", "_vendor/LICENSE-BSD.txt"]``
as literal file paths.

See a fully worked out :ref:`639-example-advanced` for a comprehensive end-to-end
application of this to a real-world complex project, with copious technical
details, and consult a `tutorial <spdxtutorial_>`__ for more help and examples
using SPDX identifiers and expressions.


.. _639-license-doc-python:

Appendix: License Documentation in Python
=========================================

There are multiple ways used or recommended to document Python project
licenses today. The most common are listed below.


.. _639-license-doc-core-metadata:

Core metadata
-------------

There are two overlapping core metadata fields to document a license: the
license ``Classifier`` `strings <classifiers_>`__ prefixed with ``License ::``
and the ``License`` `field <licensefield_>`__ as free text.

The core metadata ``License`` field documentation is currently:

.. code-block:: rst

    License
    =======

    .. versionadded:: 1.0

    Text indicating the license covering the distribution where the license
    is not a selection from the "License" Trove classifiers. See
    :ref:`"Classifier" <metadata-classifier>` below.
    This field may also be used to specify a
    particular version of a license which is named via the ``Classifier``
    field, or to indicate a variation or exception to such a license.

    Examples::

        License: This software may only be obtained by sending the
                author a postcard, and then the user promises not
                to redistribute it.

        License: GPL version 3, excluding DRM provisions

Even though there are two fields, it is at times difficult to convey anything
but simpler licensing. For instance, some classifiers lack precision
(GPL without a version) and when multiple license classifiers are
listed, it is not clear if both licenses must apply, or the user may choose
between them. Furthermore, the list of available license classifiers
is rather limited and out-of-date.


.. _639-license-doc-setuptools-wheel:

Setuptools and Wheel
--------------------

Beyond a license code or qualifier, license text files are documented and
included in a built package either implicitly or explicitly,
and this is another possible source of confusion:

- In the `Setuptools <setuptoolssdist_>`__ and `Wheel <wheels_>`__ projects,
  license files are automatically added to the distribution (at their source
  location in a source distribution/sdist, and in the ``.dist-info``
  directory of a built wheel) if they match one of a number of common license
  file name patterns (``LICEN[CS]E*``, ``COPYING*``, ``NOTICE*`` and
  ``AUTHORS*``). Alternatively, a package author can specify a list of license
  file paths to include in the built wheel under the ``license_files`` key in
  the ``[metadata]`` section of the project's ``setup.cfg``, or as an argument
  to the ``setuptools.setup()`` function. At present, following the Wheel
  project's lead, Setuptools flattens the collected license files into the
  metadata directory, clobbering files with the same name, and dumps license
  files directly into the top-level ``.dist-info`` directory, but there is a
  `desire to resolve both these issues <setuptoolsfiles_>`__,
  contingent on this PEP being accepted.

- Both tools also support an older, singular ``license_file`` parameter that
  allows specifying only one license file to add to the distribution, which
  has been deprecated for some time but still sees `some use <pipsetup_>`__.

- Following the publication of an earlier draft of this PEP, Setuptools
  `added support <setuptoolspep639_>`__ for ``License-File`` in distribution
  metadata as described in this specification. This allows other tools
  consuming the resulting metadata to unambiguously locate the license file(s)
  for a given package.


.. _639-license-doc-pypug:

PyPA Packaging Guide and Sample Project
---------------------------------------

Both the `PyPA beginner packaging tutorial <packagingtuttxt_>`__ and its more
comprehensive `packaging guide <packagingguidetxt_>`__ state that it is
important that every package include a license file. They point to the
``LICENSE.txt`` in the official PyPA sample project as an example, which is
`explicitly listed <samplesetupcfg_>`__ under the ``license_files`` key in
its ``setup.cfg``, following existing practice formally specified by this PEP.

Both the `beginner packaging tutorial <packagingtutkey_>`__ and the
`sample project <samplesetuppy_>`__ only use classifiers to declare a
package's license, and do not include or mention the ``License`` field.
The `full packaging guide <licensefield_>`__ does mention this field, but
states that authors should use the license classifiers instead, unless the
project uses a non-standard license (which the guide discourages).


.. _639-license-doc-source-files:

Python source code files
------------------------

**Note:** Documenting licenses in source code is not in the scope of this PEP.

Beside using comments and/or ``SPDX-License-Identifier`` conventions, the
license is `sometimes <pycode_>`__ documented in Python code files using
a "dunder" module-level constant, typically named ``__license__``.

This convention, while perhaps somewhat antiquated, is recognized by the
built-in ``help()`` function and the standard ``pydoc`` module.
The dunder variable will show up in the ``help()`` DATA section for a module.


.. _639-license-doc-other-packaging-tools:

Other Python packaging tools
----------------------------

- `Conda package manifests <conda_>`__ have support for ``license`` and
  ``license_file`` fields, and automatically include license files
  following similar naming patterns as the Wheel and Setuptools projects.

- `Flit <flit_>`__ recommends using classifiers instead of the ``License``
  field (per the current PyPA packaging guide).

- `PBR <pbr_>`__ uses similar data as Setuptools, but always stored in
  ``setup.cfg``.

- `Poetry <poetry_>`__ specifies the use of the ``license`` key in
  ``pyproject.toml`` with SPDX license identifiers.


.. _639-license-doc-other-projects:

Appendix: License Documentation in Other Projects
=================================================

Here is a survey of how things are done elsewhere.


Linux distribution packages
---------------------------

**Note:** in most cases, the texts of the most common licenses are included
globally in a shared documentation directory (e.g. ``/usr/share/doc``).

- Debian documents package licenses with
  `machine readable copyright files <dep5_>`__.
  It defines its own license expression syntax and list of identifiers for
  common licenses, both of which are closely related to those of SPDX.

- `Fedora packages <fedora_>`__ specify how to include
  `License Texts <fedoratext_>`__ and use a
  `License field <fedoralicense_>`__ that must be filled
  with appropriate short license identifier(s) from an extensive list
  of `"Good Licenses" <fedoralist_>`__. Fedora also defines its own
  license expression syntax, similar to that of SPDX.

- `OpenSUSE packages <opensuse_>`__ use SPDX license expressions with
  SPDX license IDs and a
  `list of additional license identifiers <opensuselist_>`__.

- `Gentoo ebuild <pycode_>`__ uses a ``LICENSE`` variable.
  This field is specified in `GLEP-0023 <glep23_>`__ and in the
  `Gentoo development manual <gentoodev_>`__.
  Gentoo also defines a list of allowed licenses and a license expression
  syntax, which is rather different from SPDX.

- The `FreeBSD package Makefile <freebsd_>`__ provides ``LICENSE`` and
  ``LICENSE_FILE`` fields with a list of custom license symbols. For
  non-standard licenses, FreeBSD recommends using ``LICENSE=UNKNOWN`` and
  adding ``LICENSE_NAME`` and ``LICENSE_TEXT`` fields, as well as sophisticated
  ``LICENSE_PERMS`` to qualify the license permissions and ``LICENSE_GROUPS``
  to document a license grouping. The ``LICENSE_COMB`` allows documenting more
  than one license and how they apply together, forming a custom license
  expression syntax. FreeBSD also recommends the use of
  ``SPDX-License-Identifier`` in source code files.

- `Arch Linux PKGBUILD <archinux_>`__ defines its
  `own license identifiers <archlinuxlist_>`__.
  The value ``'unknown'`` can be used if the license is not defined.

- `OpenWRT ipk packages <openwrt_>`__ use the ``PKG_LICENSE`` and
  ``PKG_LICENSE_FILES`` variables and recommend the use of SPDX License
  identifiers.

- `NixOS uses SPDX identifiers <nixos_>`__ and some extra license IDs
  in its license field.

- GNU Guix (based on NixOS) has a single License field, uses its own
  `license symbols list <guix_>`__ and specifies how to use one license or a
  `list of them <guixlicense_>`__.

- `Alpine Linux packages <alpine_>`__ recommend using SPDX identifiers in the
  license field.


Language and application packages
---------------------------------

- In Java, `Maven POM <maven_>`__ defines a ``licenses`` XML tag with a list
  of licenses, each with a name, URL, comments and "distribution" type.
  This is not mandatory, and the content of each field is not specified.

- The `JavaScript NPM package.json <npm_>`__ uses a single license field with
  a SPDX license expression, or the ``UNLICENSED`` ID if none is specified.
  A license file can be referenced as an alternative using
  ``SEE LICENSE IN <filename>`` in the single ``license`` field.

- `Rubygems gemspec <gem_>`__ specifies either a single or list of license
  strings. The relationship between multiple licenses in a
  list is not specified. They recommend using SPDX license identifiers.

- `CPAN Perl modules <perl_>`__ use a single license field, which is either a
  single or a list of strings. The relationship between the licenses in
  a list is not specified. There is a list of custom license identifiers plus
  these generic identifiers: ``open_source``, ``restricted``, ``unrestricted``,
  ``unknown``.

- `Rust Cargo <cargo_>`__ specifies the use of an SPDX license expression
  (v2.1) in the ``license`` field. It also supports an alternative expression
  syntax using slash-separated SPDX license identifiers, and there is also a
  ``license_file`` field. The `crates.io package registry <cratesio_>`__
  requires that either ``license`` or ``license_file`` fields are set when
  uploading a package.

- `PHP composer.json <composer_>`__ uses a ``license`` field with
  an SPDX license ID or ``proprietary``. The ``license`` field is either a
  single string with resembling the SPDX license expression syntax with
  ``and`` and ``or`` keywords; or is a list of strings if there is a
  (disjunctive) choice of licenses.

- `NuGet packages <nuget_>`__ previously used only a simple license URL, but
  now specify using a SPDX license expression and/or the path to a license
  file within the package. The NuGet.org repository states that they only
  accept license expressions that are "approved by the Open Source Initiative
  or the Free Software Foundation."

- Go language modules ``go.mod`` have no provision for any metadata beyond
  dependencies. Licensing information is left for code authors and other
  community package managers to document.

- The `Dart/Flutter spec <flutter_>`__ recommends using a single ``LICENSE``
  file that should contain all the license texts, each separated by a line
  with 80 hyphens.

- The `JavaScript Bower <bower_>`__ ``license`` field is either a single string
  or list of strings using either SPDX license identifiers, or a path/URL
  to a license file.

- The `Cocoapods podspec <cocoapod_>`__ ``license`` field is either a single
  string, or a mapping with ``type``, ``file`` and ``text`` keys.
  This is mandatory unless there is a ``LICENSE``/``LICENCE`` file provided.

- `Haskell Cabal <cabal_>`__ accepts an SPDX license expression since
  version 2.2. The version of the SPDX license list used is a function of
  the Cabal version. The specification also provides a mapping between
  legacy (pre-SPDX) and SPDX license Identifiers. Cabal also specifies a
  ``license-file(s)`` field that lists license files to be installed with
  the package.

- `Erlang/Elixir mix/hex package <mix_>`__ specifies a ``licenses`` field as a
  required list of license strings, and recommends using SPDX license
  identifiers.

- `D Langanguage dub packages <dub_>`__ define their own list of license
  identifiers and license expression syntax, similar to the SPDX standard.

- The `R Package DESCRIPTION <cran_>`__ defines its own sophisticated license
  expression syntax and list of licenses identifiers. R has a unique way of
  supporting specifiers for license versions (such as ``LGPL (>= 2.0, < 3)``)
  in its license expression syntax.


Other ecosystems
----------------

- The ``SPDX-License-Identifier`` `header <spdxid_>`__ is a simple
  convention to document the license inside a file.

- The `Free Software Foundation (FSF) <fsf_>`__ promotes the use of
  SPDX license identifiers for clarity in the `GPL <gnu_>`__ and other
  versioned free software licenses.

- The Free Software Foundation Europe (FSFE) `REUSE project <reuse_>`__
  promotes using ``SPDX-License-Identifier``.

- The `Linux kernel <linux_>`__ uses ``SPDX-License-Identifier``
  and parts of the FSFE REUSE conventions to document its licenses.

- `U-Boot <uboot_>`__ spearheaded using ``SPDX-License-Identifier`` in code
  and now follows the Linux approach.

- The Apache Software Foundation projects use `RDF DOAP <apache_>`__ with
  a single license field pointing to SPDX license identifiers.

- The `Eclipse Foundation <eclipse_>`__ promotes using
  ``SPDX-license-Identifiers``.

- The `ClearlyDefined project <clearlydefined_>`__ promotes using SPDX
  license identifiers and expressions to improve license clarity.

- The `Android Open Source Project <android_>`__ uses ``MODULE_LICENSE_XXX``
  empty tag files, where ``XXX`` is a license code such as ``BSD``, ``APACHE``,
  ``GPL``, etc. It also uses a ``NOTICE`` file that contains license and
  notice texts.


References
==========

.. _alpine: https://wiki.alpinelinux.org/wiki/Creating_an_Alpine_package#license
.. _android: https://github.com/aosp-mirror/platform_external_tcpdump/blob/android-platform-12.0.0_r1/MODULE_LICENSE_BSD
.. _apache: https://svn.apache.org/repos/asf/allura/doap_Allura.rdf
.. _archinux: https://wiki.archlinux.org/title/PKGBUILD#license
.. _archlinuxlist: https://archlinux.org/packages/core/any/licenses/files/
.. _badclassifiers: https://github.com/pypa/trove-classifiers/issues/17#issuecomment-385027197
.. _bower: https://github.com/bower/spec/blob/b00c4403e22e3f6177c410ed3391b9259687e461/json.md#license
.. _cabal: https://cabal.readthedocs.io/en/3.6/cabal-package.html?highlight=license#pkg-field-license
.. _cargo: https://doc.rust-lang.org/cargo/reference/manifest.html#package-metadata
.. _cc0: https://creativecommons.org/publicdomain/zero/1.0/
.. _cdstats: https://clearlydefined.io/stats
.. _choosealicense: https://choosealicense.com/
.. _choosealicenselist: https://choosealicense.com/licenses/
.. _chooseamitlicense: https://choosealicense.com/licenses/mit/
.. _classifierissue: https://github.com/pypa/trove-classifiers/issues/17
.. _classifiers: https://pypi.org/classifiers
.. _classifiersrepo: https://github.com/pypa/trove-classifiers
.. _clearlydefined: https://clearlydefined.io
.. _cocoapod: https://guides.cocoapods.org/syntax/podspec.html#license
.. _composer: https://getcomposer.org/doc/04-schema.md#license
.. _conda: https://docs.conda.io/projects/conda-build/en/stable/resources/define-metadata.html#about-section
.. _coremetadataspec: https://packaging.python.org/specifications/core-metadata
.. _coremetadataclassifiers: https://packaging.python.org/en/latest/specifications/core-metadata/#classifier-multiple-use
.. _cran: https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Licensing
.. _cratesio: https://doc.rust-lang.org/cargo/reference/registries.html#publish
.. _dep5: https://dep-team.pages.debian.net/deps/dep5/
.. _dontchoosealicense: https://choosealicense.com/no-permission/
.. _dub: https://dub.pm/package-format-json.html#licenses
.. _eclipse: https://www.eclipse.org/legal/epl-2.0/faq.php
.. _fedora: https://docs.fedoraproject.org/en-US/packaging-guidelines/LicensingGuidelines/
.. _fedoralicense: https://docs.fedoraproject.org/en-US/packaging-guidelines/LicensingGuidelines/#_valid_license_short_names
.. _fedoralist: https://fedoraproject.org/wiki/Licensing:Main?rd=Licensing#Good_Licenses
.. _fedoratext: https://docs.fedoraproject.org/en-US/packaging-guidelines/LicensingGuidelines/#_license_text
.. _flit: https://flit.readthedocs.io/en/stable/pyproject_toml.html
.. _flutter: https://flutter.dev/docs/development/packages-and-plugins/developing-packages#adding-licenses-to-the-license-file
.. _freebsd: https://docs.freebsd.org/en/books/porters-handbook/makefiles/#licenses
.. _fsf: https://www.fsf.org/blogs/rms/rms-article-for-claritys-sake-please-dont-say-licensed-under-gnu-gpl-2
.. _gem: https://guides.rubygems.org/specification-reference/#license=
.. _gentoo: https://devmanual.gentoo.org/ebuild-writing/variables/index.html#license
.. _gentoodev: https://devmanual.gentoo.org/general-concepts/licenses/index.html
.. _glep23: https://www.gentoo.org/glep/glep-0023.html
.. _globmodule: https://docs.python.org/3/library/glob.html
.. _gnu: https://www.gnu.org/licenses/identify-licenses-clearly.html
.. _guix: https://git.savannah.gnu.org/cgit/guix.git/tree/guix/licenses.scm?h=v1.3.0
.. _guixlicense: https://guix.gnu.org/manual/en/html_node/package-Reference.html#index-license_002c-of-packages
.. _hatch: https://hatch.pypa.io/latest/
.. _hatchimplementation: https://discuss.python.org/t/12622/22
.. _installedspec: https://packaging.python.org/specifications/recording-installed-packages/
.. _interopissue: https://github.com/pypa/interoperability-peps/issues/46
.. _licenseexplib: https://github.com/nexB/license-expression/
.. _licensefield: https://packaging.python.org/guides/distributing-packages-using-setuptools/#license
.. _linux: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/license-rules.rst
.. _maven: https://maven.apache.org/pom.html#Licenses
.. _mitlicense: https://opensource.org/licenses/MIT
.. _mix: https://hex.pm/docs/publish
.. _nixos: https://github.com/NixOS/nixpkgs/blob/21.05/lib/licenses.nix
.. _npm: https://docs.npmjs.com/cli/v8/configuring-npm/package-json#license
.. _nuget: https://docs.microsoft.com/en-us/nuget/reference/nuspec#licenseurl
.. _numpyissue: https://github.com/numpy/numpy/issues/8689
.. _opensuse: https://en.opensuse.org/openSUSE:Packaging_guidelines#Licensing
.. _opensuselist: https://docs.google.com/spreadsheets/d/14AdaJ6cmU0kvQ4ulq9pWpjdZL5tkR03exRSYJmPGdfs/pub
.. _openwrt: https://openwrt.org/docs/guide-developer/packages#buildpackage_variables
.. _osi: https://opensource.org
.. _packagingguidetxt: https://packaging.python.org/guides/distributing-packages-using-setuptools/#license-txt
.. _packagingissue: https://github.com/pypa/packaging-problems/issues/41
.. _packaginglicense: https://github.com/pypa/packaging/blob/21.2/LICENSE
.. _packagingtutkey: https://packaging.python.org/tutorials/packaging-projects/#configuring-metadata
.. _packagingtuttxt: https://packaging.python.org/tutorials/packaging-projects/#creating-a-license
.. _pbr: https://docs.openstack.org/pbr/latest/user/features.html
.. _pep621spec: https://packaging.python.org/specifications/declaring-project-metadata/
.. _pep621specdynamic: https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#dynamic
.. _pepissue: https://github.com/pombredanne/spdx-pypi-pep/issues/1
.. _perl: https://metacpan.org/pod/CPAN::Meta::Spec#license
.. _pipsetup: https://github.com/pypa/pip/blob/21.3.1/setup.cfg#L114
.. _poetry: https://python-poetry.org/docs/pyproject/#license
.. _pycode: https://github.com/search?l=Python&q=%22__license__%22&type=Code
.. _pypi: https://pypi.org/
.. _pypugdistributionpackage: https://packaging.python.org/en/latest/glossary/#term-Distribution-Package
.. _pypugglossary: https://packaging.python.org/glossary/
.. _pypugproject: https://packaging.python.org/en/latest/glossary/#term-Project
.. _pytorch: https://pypi.org/project/torch/
.. _reuse: https://reuse.software/
.. _reusediscussion: https://github.com/pombredanne/spdx-pypi-pep/issues/7
.. _samplesetupcfg: https://github.com/pypa/sampleproject/blob/3a836905fbd687af334db16b16c37cf51dcbc99c/setup.cfg
.. _samplesetuppy: https://github.com/pypa/sampleproject/blob/3a836905fbd687af334db16b16c37cf51dcbc99c/setup.py#L98
.. _scancodetk: https://github.com/nexB/scancode-toolkit
.. _scipyissue: https://github.com/scipy/scipy/issues/7093
.. _sdistspec: https://packaging.python.org/specifications/source-distribution-format/
.. _setuptools5911: https://github.com/pypa/setuptools/blob/v59.1.1/setup.cfg
.. _setuptoolsfiles: https://github.com/pypa/setuptools/issues/2739
.. _setuptoolspep639: https://github.com/pypa/setuptools/pull/2645
.. _setuptoolssdist: https://github.com/pypa/setuptools/pull/1767
.. _spdx: https://spdx.dev/
.. _spdxid: https://spdx.dev/ids/
.. _spdxlist: https://spdx.org/licenses/
.. _spdxpression: https://spdx.github.io/spdx-spec/SPDX-license-expressions/
.. _spdxpy: https://github.com/spdx/tools-python/
.. _spdxtutorial: https://github.com/david-a-wheeler/spdx-tutorial
.. _spdxversion: https://github.com/pombredanne/spdx-pypi-pep/issues/6
.. _uboot: https://www.denx.de/wiki/U-Boot/Licensing
.. _unlicense: https://unlicense.org/
.. _wheelfiles: https://github.com/pypa/wheel/issues/138
.. _wheelproject: https://wheel.readthedocs.io/en/stable/
.. _wheels: https://github.com/pypa/wheel/blob/0.37.0/docs/user_guide.rst#including-license-files-in-the-generated-wheel-file
.. _wheelspec: https://packaging.python.org/specifications/binary-distribution-format/


Acknowledgments
===============

- Nick Coghlan
- Kevin P. Fleming
- Pradyun Gedam
- Oleg Grenrus
- Dustin Ingram
- Chris Jerdonek
- Cyril Roelandt
- Luis Villa


Copyright
=========

This document is placed in the public domain or under the
`CC0-1.0-Universal license <cc0_>`__, whichever is more permissive.