PEP: 507
Title: Migrate CPython to Git and GitLab
Version: $Revision$
Last-Modified: $Date$
Author: Barry Warsaw <barry@python.org>
Status: Rejected
Type: Process
Content-Type: text/x-rst
Created: 30-Sep-2015
Post-History:
Resolution: https://mail.python.org/pipermail/core-workflow/2016-January/000345.html


Abstract
========

This PEP proposes migrating the repository hosting of CPython and the
supporting repositories to Git.  Further, it proposes adopting a
hosted GitLab instance as the primary way of handling merge requests,
code reviews, and code hosting.  It is similar in intent to :pep:`481`
but proposes an open source alternative to GitHub and omits the
proposal to run Phabricator.  As with :pep:`481`, this particular PEP is
offered as an alternative to :pep:`474` and :pep:`462`.


Rationale
=========

CPython is an open source project which relies on a number of
volunteers donating their time.  As with any healthy, vibrant open
source project, it relies on attracting new volunteers as well as
retaining existing developers.  Given that volunteer time is the most
scarce resource, providing a process that maximizes the efficiency of
contributors and reduces the friction for contributions, is of vital
importance for the long-term health of the project.

The current tool chain of the CPython project is a custom and unique
combination of tools.  This has two critical implications:

* The unique nature of the tool chain means that contributors must
  remember or relearn, the process, workflow, and tools whenever they
  contribute to CPython, without the advantage of leveraging long-term
  memory and familiarity they retain by working with other projects in
  the FLOSS ecosystem.  The knowledge they gain in working with
  CPython is unlikely to be applicable to other projects.

* The burden on the Python/PSF infrastructure team is much greater in
  order to continue to maintain custom tools, improve them over time,
  fix bugs, address security issues, and more generally adapt to new
  standards in online software development with global collaboration.

These limitations act as a barrier to contribution both for highly
engaged contributors (e.g. core Python developers) and especially for
more casual "drive-by" contributors, who care more about getting their
bug fix than learning a new suite of tools and workflows.

By proposing the adoption of both a different version control system
and a modern, well-maintained hosting solution, this PEP addresses
these limitations.  It aims to enable a modern, well-understood
process that will carry CPython development for many years.


Version Control System
----------------------

Currently the CPython and supporting repositories use Mercurial.  As a
modern distributed version control system, it has served us well since
the migration from Subversion.  However, when evaluating the VCS we
must consider the capabilities of the VCS itself as well as the
network effect and mindshare of the community around that VCS.

There are really only two real options for this, Mercurial and Git.
The technical capabilities of the two systems are largely equivalent,
therefore this PEP instead focuses on their social aspects.

It is not possible to get exact numbers for the number of projects or
people which are using a particular VCS, however we can infer this by
looking at several sources of information for what VCS projects are
using.

The Open Hub (previously Ohloh) statistics [#openhub-stats]_ show that
37% of the repositories indexed by The Open Hub are using Git (second
only to Subversion which has 48%) while Mercurial has just 2%, beating
only Bazaar which has 1%.  This has Git being just over 18 times as
popular as Mercurial on The Open Hub.

Another source of information on VCS popularity is PyPI itself. This
source is more targeted at the Python community itself since it
represents projects developed for Python.  Unfortunately PyPI does not
have a standard location for representing this information, so this
requires manual processing.  If we limit our search to the top 100
projects on PyPI (ordered by download counts) we can see that 62% of
them use Git, while 22% of them use Mercurial, and 13% use something
else.  This has Git being just under 3 times as popular as Mercurial
for the top 100 projects on PyPI.

These numbers back up the anecdotal evidence for Git as the far more
popular DVCS for open source projects.  Choosing the more popular VCS
has a number of positive benefits.

For new contributors it increases the likelihood that they will have already
learned the basics of Git as part of working with another project or if they
are just now learning Git, that they'll be able to take that knowledge and
apply it to other projects.  Additionally a larger community means more people
writing how to guides, answering questions, and writing articles about Git
which makes it easier for a new user to find answers and information about the
tool they are trying to learn and use.  Given its popularity, there may also
be more auxiliary tooling written *around* Git.  This increases options for
everything from GUI clients, helper scripts, repository hosting, etc.

Further, the adoption of Git as the proposed back-end repository
format doesn't prohibit the use of Mercurial by fans of that VCS!
Mercurial users have the [#hg-git]_ plugin which allows them to push
and pull from a Git server using the Mercurial front-end.  It's a
well-maintained and highly functional plugin that seems to be
well-liked by Mercurial users.


Repository Hosting
------------------

Where and how the official repositories for CPython are hosted is in
someways determined by the choice of VCS.  With Git there are several
options.  In fact, once the repository is hosted in Git, branches can
be mirrored in many locations, within many free, open, and proprietary
code hosting sites.

It's still important for CPython to adopt a single, official
repository, with a web front-end that allows for many convenient and
common interactions entirely through the web, without always requiring
local VCS manipulations.  These interactions include as a minimum,
code review with inline comments, branch diffing, CI integration, and
auto-merging.

This PEP proposes to adopt a [#GitLab]_ instance, run within the
python.org domain, accessible to and with ultimate control from the
PSF and the Python infrastructure team, but donated, hosted, and
primarily maintained by GitLab, Inc.

Why GitLab?  Because it is a fully functional Git hosting system, that
sports modern web interactions, software workflows, and CI
integration.  GitLab's Community Edition (CE) is open source software,
and thus is closely aligned with the principles of the CPython
community.


Code Review
-----------

Currently CPython uses a custom fork of Rietveld modified to not run
on Google App Engine and which is currently only really maintained by
one person.  It is missing common features present in many modern code
review tools.

This PEP proposes to utilize GitLab's built-in merge requests and
online code review features to facilitate reviews of all proposed
changes.


GitLab merge requests
---------------------

The normal workflow for a GitLab hosted project is to submit a *merge request*
asking that a feature or bug fix branch be merged into a target branch,
usually one or more of the stable maintenance branches or the next-version
master branch for new features.  GitLab's merge requests are similar in form
and function to GitHub's pull requests, so anybody who is already familiar
with the latter should be able to immediately utilize the former.

Once submitted, a conversation about the change can be had between the
submitter and reviewer.  This includes both general comments, and inline
comments attached to a particular line of the diff between the source and
target branches.  Projects can also be configured to automatically run
continuous integration on the submitted branch, the results of which are
readily visible from the merge request page.  Thus both the reviewer and
submitter can immediately see the results of the tests, making it much easier
to only land branches with passing tests.  Each new push to the source branch
(e.g. to respond to a commenter's feedback or to fix a failing test) results
in a new run of the CI, so that the state of the request always reflects the
latest commit.

Merge requests have a fairly major advantage over the older "submit a patch to
a bug tracker" model.  They allow developers to work completely within the VCS
using standard VCS tooling, without requiring the creation of a patch file or
figuring out the right location to upload the patch to.  This lowers the
barrier for sending a change to be reviewed.

Merge requests are far easier to review.  For example, they provide nice
syntax highlighted diffs which can operate in either unified or side by side
views.  They allow commenting inline and on the merge request as a whole and
they present that in a nice unified way which will also hide comments which no
longer apply.  Comments can be hidden and revealed.

Actually merging a merge request is quite simple, if the source branch applies
cleanly to the target branch.  A core reviewer simply needs to press the
"Merge" button for GitLab to automatically perform the merge.  The source
branch can be optionally rebased, and once the merge is completed, the source
branch can be automatically deleted.

GitLab also has a good workflow for submitting pull requests to a project
completely through their web interface.  This would enable the Python
documentation to have "Edit on GitLab" buttons on every page and people who
discover things like typos, inaccuracies, or just want to make improvements to
the docs they are currently reading.  They can simply hit that button and get
an in browser editor that will let them make changes and submit a merge
request all from the comfort of their browser.


Criticism
=========

X is not written in Python
--------------------------

One feature that the current tooling (Mercurial, Rietveld) has is that the
primary language for all of the pieces are written in Python.  This PEP
focuses more on the *best* tools for the job and not necessarily on the *best*
tools that happen to be written in Python.  Volunteer time is the most
precious resource for any open source project and we can best respect and
utilize that time by focusing on the benefits and downsides of the tools
themselves rather than what language their authors happened to write them in.

One concern is the ability to modify tools to work for us, however one of the
Goals here is to *not* modify software to work for us and instead adapt
ourselves to a more standardized workflow.  This standardization pays off in
the ability to re-use tools out of the box freeing up developer time to
actually work on Python itself as well as enabling knowledge sharing between
projects.

However, if we do need to modify the tooling, Git itself is largely written in
C the same as CPython itself.  It can also have commands written for it using
any language, including Python.  GitLab itself is largely written in Ruby and
since it is Open Source software, we would have the ability to submit merge
requests to the upstream Community Edition, albeit in language potentially
unfamiliar to most Python programmers.


Mercurial is better than Git
----------------------------

Whether Mercurial or Git is better on a technical level is a highly subjective
opinion.  This PEP does not state whether the mechanics of Git or Mercurial
are better, and instead focuses on the network effect that is available for
either option.  While this PEP proposes switching to Git, Mercurial users are
not left completely out of the loop.  By using the hg-git extension for
Mercurial, working with server-side Git repositories is fairly easy and
straightforward.


CPython Workflow is too Complicated
-----------------------------------

One sentiment that came out of previous discussions was that the multi-branch
model of CPython was too complicated for GitLab style merge requests.  This
PEP disagrees with that sentiment.

Currently any particular change requires manually creating a patch for 2.7 and
3.x which won't change at all in this regards.

If someone submits a fix for the current stable branch (e.g. 3.5) the merge
request workflow can be used to create a request to merge the current stable
branch into the master branch, assuming there is no merge conflicts.  As
always, merge conflicts must be manually and locally resolved.  Because
developers also have the *option* of performing the merge locally, this
provides an improvement over the current situation where the merge *must*
always happen locally.

For fixes in the current development branch that must also be applied to
stable release branches, it is possible in many situations to locally cherry
pick and apply the change to other branches, with merge requests submitted for
each stable branch.  It is also possible just cherry pick and complete the
merge locally.  These are all accomplished with standard Git commands and
techniques, with the advantage that all such changes can go through the review
and CI test workflows, even for merges to stable branches.  Minor changes may
be easily accomplished in the GitLab web editor.

No system can hide all the complexities involved in maintaining several long
lived branches.  The only thing that the tooling can do is make it as easy as
possible to submit and commit changes.


Open issues
===========

* What level of hosted support will GitLab offer?  The PEP author has been in
  contact with the GitLab CEO, with positive interest on their part.  The
  details of the hosting offer would have to be discussed.

* What happens to Roundup and do we switch to the GitLab issue tracker?
  Currently, this PEP is *not* suggesting we move from Roundup to GitLab
  issues.  We have way too much invested in Roundup right now and migrating
  the data would be a huge effort.  GitLab does support webhooks, so we will
  probably want to use webhooks to integrate merges and other events with
  updates to Roundup (e.g. to include pointers to commits, close issues,
  etc. similar to what is currently done).

* What happens to wiki.python.org?  Nothing!  While GitLab does support wikis
  in repositories, there's no reason for us to migration our Moin wikis.

* What happens to the existing GitHub mirrors?  We'd probably want to
  regenerate them once the official upstream branches are natively hosted in
  Git.  This may change commit ids, but after that, it should be easy to
  mirror the official Git branches and repositories far and wide.

* Where would the GitLab instance live?  Physically, in whatever hosting
  provider GitLab chooses.  We would point gitlab.python.org (or
  git.python.org?) to this host.


References
==========

.. [#openhub-stats] `Open Hub Statistics <https://www.openhub.net/repositories/compare>`
.. [#hg-git] `Hg-Git mercurial plugin <https://hg-git.github.io/>`
.. [#GitLab] `https://about.gitlab.com/`


Copyright
=========

This document has been placed in the public domain.



..
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   coding: utf-8
   End: