The fate of Distutils – Pycon Summit + Packaging Sprint detailed report

Wed, 03 Mar 2010

The summit

I quickly posted an entry right after the Language Summit we held before Pycon in Atlanta. Basically, all the work I have being doing in Distutils and the PEPs we’ve prepared for the “big refactoring” will not be done in the standard library. Distutils in the stdlib trunk will be reverted to its current 2.6.x state.

I was quite despaired right after the summit. All the work we did during in the past year would not land into the standard library for 2.7, and all the pre- refactoring work I did, like making the test coverage decent, was going to be useless for the stdlib. Having that work included in 2.7 was one of my goal and I worked hard on making sure most of the important PEPs would be accepted before the feature freeze for 2.7 happened (the first beta, freezing new features, is in 4 weeks.)

I was even more depressed because I started to pull out of Distutils the “sysconfig” module and simplified the code in distutils, while making sure that the backward compatibility was kept.

I had a twenty minutes meeting with Guido after the Summit to clarify the situation and he helped me understand why this was the right path and worked with me on what to do next in the stdlib front and outside the stdlib.

Basically, a package that comes in the standard library has a foot in the grave (I am paraphrasing Guido here.). Its APIs is frozen, and people don’t really expect nothing from it, but small new features and bug fixes. Refactorings are dangerous, if not impossible.

I have hit that problem in the past, in one of the 2.6 bug fix release, where I broke Setuptools compatibility because of an internal change I have made in a private method. The breakage was partly because Setuptools overrides a private method and partly because a public method that was not clearly documented was affected.

A few weeks ago the problem happened again : someone complained on python-dev because a declaration (an exception class) was missing from Distutils. An exception class was imported from the errors module into another Distutils module, but not used anymore there. And the module it was imported in didn’t have an __all__ attribute. A third-party tool was importing the exception from the wrong module, so when I cleaned it up the third party module was broken.

So basically, any change I make in Distutils, even a simple cleaning, and worse, even a private method change, potentially breaks third-party tools.

You could argue that they should be careful in how they use Distutils, and never patch it or change its internal etc., and for edge cases like missing imports, just fix them.

But hey, Python 2.7 is out of the door in five weeks, and the user experience will be that Python has broken third-party libraries.

And the worse part of it : some of these libraries like Setuptools are not really maintained anymore and expect Distutils not to evolve anymore. But Setuptools is used nevertheless since it solves some problems Distutils doesn’t. So the end user is the one that will suffer from those regressions.

In other words, project like Setuptools slows down the work we want to do in packaging because the current eco-system depends on a big, monolithic, messy pile of code that is located in different projects with different maintainers.

At this point, I understood that the easiest way for  Distutils to evolve was to get away from this pile and grow on another namespace called distutils2.

Welcome Distutils2

If you have followed what is going on with packaging since last year, you might think: “distutils, setuptools, distribute and now distutils2 ?, oh no!!!”

But that is going to be for the benefit of everyone. See the roadmap in image below.

State of packaging

State of packaging

So basically, I have forked Distutils and renamed its package into Distutils2. The project is located in http://hg.python.org/distutils2 and the goal is to put it back into the standard library as soon as it reaches a state where it starts to be used by the community. Distutils will just die slowly, probably pulling Setuptools and Distribute with it.

The Distribute project is still important because it can help us releasing bug fixes or Python 3 support things today.

Distutils2 will be 2.4 to 3.2 compatible and will get back from Distribute the good bits and implement the PEPs that were accepted lately PEP 345 and PEP 386.

And I am happily removing old code we don’t want/need anymore without worrying about backward compatibility. Yeah !

The packaging sprint

After the conferences, we started a packaging sprint and I was surprised because many people showed up and worked on the topic.

Brainstorming on PEP 376

Brainstorming on PEP 376

We created a few teams to work on PEP 376, mkpkg, the Hitchicker’s Guide to Packaging (HHGP), and Distribute. I won’t say the name of each person, I am too scared to forget someone :D .

PEP 376

Like last year, people from various distributions (Fedora, Ubuntu, Debian) and I worked on packaging issues. They worked on PEP 386 last year mainly (the versioning scheme) and focused on PEP 376 this year. This PEP is about setting up a standard for installed packages, and an installation index that allows to query what packages are installed, and get their metadata. In extend, it provides an uninstall feature. The goal is to have a standard for all package managers of course.

One part of the PEP is about describing the data files that are installed with the project (like configuration files or documentation) so they can be removed and maybe relocated. The group focused on describing the files a project contains in a static way (in setup.cfg) with variables that can be expanded an installation time (which values are provided by Python, but globally configurable by the OS packagers.)

We did quite some work and brainstorming on this, and even focused on removing setup.py ! A fully static description of a project (metadata+file list) is the key to a better packaging tool !

Expect a proposal soon on distutils-SIG, for PEP 376. If you want to have a look, the draft proposal is here: draft.

mkpkg and Distribute

We had two one-member teams at some point, so I can name them without being scared of forgetting someone ;)

Sean worked on a nice add-on for Distutils2, a script that builds a setup.py file after asking you a few questions. He blogged about it. so I don’t need to get into further details :)

Noufal worked on fixing some bugs in Distribute. We should do a release at some point.

The HitchHicker’s Guide to Packaging

Another group worked on the guide. The goal is to provide some help for people that want to package things today and are despaired with the sparse documentation they can find. Which tool to use ? how ? when ?

The work done was quite amazing, look at it : http://guide.python-distribute.org

I have spoken with Georg Brandl to see how we could move it to docs.python.org and make it grow there.

Distutils2 coding

Besides PEP 345, I worked on making Distutils2 work for 2.2, 2.5, 2.6 and this is now over. I have also almost fully implemented PEP 345 in there.

There’s now a metadata module with a dict-like DistributionMetadata class that knows how to read and write PKG-INFO files. It also knows how to interpret the micro-language we’ve defined: the environment markers.

Last, I’ve added the PEP 386 version module : version.py. This one is used now by the metadata class to control versions.

More to come !

Next sprint at Confoo.ca

The next packaging sprint will happen in Montreal, where I am going as a speaker next week. We will continue the worked started, so stay tuned.



Read moreComment

Pycon slides + answers to GM questions

Sat, 20 Feb 2010

I did my presentation yesterday, and it seems like people enjoyed it, from what I’ve heard in the halls and seen online. I am very glad about this feedback because packaging is not the sexiest topic in programming conferences in general.

Anyways, here are my slides : http://ziade.org/slides/pycon-2010-state-of-packaging.pdf

And as promised, I’ve answered to all the questions people asked in the google moderator : http://www.google.com/moderator/#16/e=4395 . Thanks to everyone that has participated.

I am now going to enjoy the rest of the Pycon event and I am looking forward to the sprints.



Read moreComment

Python Language Summit — Summary of the packaging track

Thu, 18 Feb 2010

Here are quick wrapup on what has been said during the packaging session in the language summit that is happening today at Pycon.

The four major points are:

  1. The implementation of the accepted PEPs that have been done lately will not happen in Distutils but in a new package in the Distribute project (so logically in a “distribute” package). This resolves backward compatibility issues: new features will be under the “distribute” namespace.
  2. Distribute will stay a third-party package and will be integrated in the standard library once it has enough support and feedback from the community. So this could happen in 3.3 (or 2.8 ;) ). Some part that are useful for the existing distutils package might be added in the stdlib today. But the idea is to stop adding features in distutils and focus on distribute.
  3. The Hitchhicker’s guide to packaging is going to be moved into the Python repository, so it becomes part of docs.python.org. It’s not finished yet but it’ll grow there.
  4. Ian threw the idea to have virtualenv as a core feature, but he’s not sure how yet. Some brainstroming on this should happen during Pycon.


Read moreComment

Ask questions about packaging

Thu, 11 Feb 2010

I have started a Google Moderator series to gather questions people would like to ask about Python packaging. I will answer to the most popular questions at the end of my talk at Pycon.

Here’s the link : http://www.google.com/moderator/#16/e=4395

Please, add some questions, and vote !



Read moreComment

PEP 345 and 386 accepted — Summary of changes

Wed, 10 Feb 2010

Several PEPs were accepted this month by Guido, and among them PEP 345 and PEP 386, which are about Python packaging. I am summarizing in this entry the main changes we’ve made.

PEP 345 – Metadata v1.2

This PEP is about the Metadata that gets added in your project when you build a distribution using Distutils or a Distutils-based tool. Those are fields like “name” or “version” you pass as options in your setup.py file. They eventually land in a PKG-INFO file then at PyPI and in each Python’s site-packages where the project is installed. They are also useful when your project gets repackaged by an OS packager.

New fields

PEP 345 adds some new fields :

  • Maintainer : a string containing the maintainer’s name
  • Maintainer-email : a string containing the maintainer’s e-mail
  • Requires-Python : Python version(s) that the distribution is guaranteed to be compatible with.
  • Requires-External : Describes some dependencies in the system that the distribution is to be used
  • Requires-Dist : String naming some other distutils project required by this distribution.
  • Provides-Dist : String naming a Distutils project which is contained within this distribution.
  • Obsoletes-Dist : String describing a distutils project’s distribution which this distribution renders obsolete
  • Project-URL : String containing a browsable URL for the project and a label for it, separated by a comma.

Maintainer and Maintainer-email were added because people were confused about the maintainer and maintainer_email options they have in Distutils. Before PEP 345, if you used the author and the maintainer fields, one was dropped and one was kept to fill the Author metadata field.

Requires-Python was added so people could list the Python versions their project is compatible with. We do have classifiers already for that in the Trove classifier, but this new field is more than a simple field : the version string  that is used supports a syntax that makes it possible to describe any set of Python versions. See the version specifier of the PEP.

Remember the requires.txt metadata Setuptools introduced together with the install_requires option ? Requires-Dist is comparable to this and will let you define dependencies on other Python projects. Distutils had the Requires field, but it was defining dependencies at the module level and this was never really used by the community. So Requires is gone in 1.2. Last, Requires-Dist is using a version comparison scheme described in details in the version specifier section.

Provides-Dist gives you the ability among other things to reorganize your project names or to distribute a subproject in several projects. For instance, the project called ZODB used to include the project transaction that is also distributed as a standalone project now. If ZODB is installed, and if you need transaction, you won’t have to install it again.

Obsoletes-Dist will let you make sure a project that is incompatible with your project is not installed at the same time.

Project-URL is useful to provide a list of URLs for your project like a browsable repository or a tracker. The goal will be to add a small box on PyPI project pages for these URLs, so developers can emphasize them. This will hopefully address the complaints PyPI had in these past months when the comment system was added.

Some small changes were made on existing fields. One important change is on the Description field (long_description in setup.py). Before PEP 345, once this value was written in the metadata file, its empty lines and spaces at the beginning of lines were truncated. In other words, if a tool was reading back this value, its reSTrucured syntax was broken. This is no longer the case.

Once I’ve finished implementing PEP 345, you will be able to read back a project’s metadata usig the DistributionMetadata class. See an example.

Environment Markers

When Pip wants to install all the dependencies a project requires, it has to follow these steps:

  1. download the project from PyPI
  2. execute a Distutils command on setup.py, like egg_info, to get the metadata. And in particular the list of requirements

This is mandatory because the metadata are not statically defined and the developer might need to run some code in his setup.py to know what dependencies are required, depending on the target platform.

For example:

    if sys.platform == 'win32':
        install_requires = ['pywin32']
    else:
        install_requires = []

In other words, there’s no way to get the metadata without running third-party code. Environment markers fix this issue in most cases by providing a micro-language that can be used at the field level. At the end, you can write things like that in the metadata:

    Requires-Dist: pywin32 (>1.0); sys.platform == 'win32'

And Distutils will provide a tool to parse and execute this expression, so you know if your target platform have to use this metadata field. Meaning that Pip or other tools will be able to read metadata of a project without running any third party code. For example to get all the dependencies of a project depending on the target platform just by querying PyPI.

See all the details in the section in the PEP.

PEP 386

Throughout all the PEP 345, we had to compare versions. To be able to perform this, we need to have a common standard for versions numbers. That’s what PEP 386 was written for.

The idea is not to force people to version their project using this version scheme, but rather to provide a scheme that is good enough for interoperability and that is human readable.

PEP 386 provides this scheme (in pseudo-format) :

  N.N[.N]+[{a|b|c|rc}N[.N]+][.postN][.devN]

The corresponding regular expression is:

    expr = r"""^
    (?P<version>\d+\.\d+)         # minimum 'N.N'
    (?P<extraversion>(?:\.\d+)*)  # any number of extra '.N' segments
    (?:
        (?P<prerel>[abc]|rc)         # 'a' = alpha, 'b' = beta
                                     # 'c' or 'rc' = release candidate
        (?P<prerelversion>\d+(?:\.\d+)*)
    )?
    (?P<postdev>(\.post(?P<post>\d+))?(\.dev(?P<dev>\d+))?)?
    $"""

The scheme handles these cases:

  • pre-releases
  • development versions
  • final versions
  • post-release versions
  • development versions of post-release versions

Here are some examples:

>>> from verlib import NormalizedVersion as V
>>> (V('1.0a1')
...  < V('1.0a2.dev456')
...  < V('1.0a2')
...  < V('1.0a2.1.dev456')
...  < V('1.0a2.1')
...  < V('1.0b1.dev456')
...  < V('1.0b2')
...  < V('1.0b2.post345')
...  < V('1.0c1.dev456')
...  < V('1.0c1')
...  < V('1.0.dev456')
...  < V('1.0')
...  < V('1.0.post456.dev34')
...  < V('1.0.post456'))
True

This can look utterly complex to you, but in fact there are good chances that your version scheme is already compatible with this one. We’ve tested in on PyPI, and 88 % of the projects’ distributions were recognized.

The suggest_rational_version function

Let’s face it: there are hundreds of valid versionning schemes that are not compatible with what we’ve done. So we are adding a function called suggest_rational_version that can be used to transform a version that is not “PEP-386″ compliant into one that is compliant.

This does a number of simple normalizations to the given string, based on an observation of versions currently in use on PyPI.

Given a dump of those versions on February 10th 2010, the function has given those results out of the 9066 distributions PyPI had:

  • 8058 (88.88%) already match NormalizedVersion without any change
  • 795 (8.77%) match when using this suggestion method
  • 213 (2.35%) don’t match at all.

And here’s an extract of the 2.35% unrecognized scheme:

  • 0.2-grigoropoulos
  • 0.1-alphadev
  • working proof of concept
  • bzr14
  • 1 (first draft)

In other words, they are unusable anyways. If you want to try this on your own versions, grab the code at http://bitbucket.org/tarek/distutilsversion/. And if you version doesn’t match at all and you think its a mistake, let me know so we can work your case.

Conclusion + my rant on packaging

I have started working seriously on packaging issues a year ago. And I kept on hearing complaints on how packaging sucks hard. That was frustrating since some folks and I were working hard to change things (and we still do). I kept seeing the same people ranting about packaging, and most of the time they were not willing to help around. I guess that’s just me being naive, but let me say it one more time :)

If you don’t like how packaging works in Python, stop complaining and come in the Distutils mailing list, or in #distutils on freenode and help us !

I am now really glad that these PEPs were accepted by Guido because they are the first milestone we had to reach to improve things for real in packaging. Python 2.7 is going to make a big jump forward in this area !

Next milestone we are trying to reach for 2.7 is to finish PEP 376 (uninstall feature, querying installed packages, etc)



Read moreComment

improving Python’s getpass module

Sat, 06 Feb 2010

UPDATED see the end.

The Python standard library has a module called getpass you can use to get a password from the prompt:

>>> import getpass
>>> password = getpass.getpass()
Password:          <-- non-echoed typing here
>>> print password
worked

That’s nice, and Distutils uses it to ask for your password when you register or upload a release at PyPI, if it’s not found in your pypirc file. But this is annoying to type and type again your password, so you end up saving it in clear text in pypirc. Thats sucks. And the getpass module gets pretty useless if you want to store and retrieve passwords from other places than the user brain.

But wait… we have the Keyring project now.. what about making getpass use Keyring so you can safely read a password from your favorite keyring (Keychain, KWallet, etc..) ?

I’ve started to write a new getpass module that could do this. But instead of adding a keyring dependency in it and struggling for months (years) to get the addition of Keyring into the stdlib, I have made getpass pluggable.

In my improved version, you can define in a small configuration file (getpass.cfg) an arbritrary function that will be used by getpass for the getpass.getpass API. Here’s such a file:

  [getpass]
  getpass-backend = keyring:get_pass_get_password

Here I am configuring get pass to use the get_pass_get_password function from the keyring package. That’s a function that gets installed in your Python once Keyring is installed.

This function has the same interface than the default getpass.getpass API and calls keyring.

The modified getpass module is here: http://bitbucket.org/tarek/getpass/

And works against the current trunk of Keyring.

What I would like to do now is to propose the small changes I’ve made in Python’s getpass for inclusion in the stdlib. They are backward compatible changes and offers a simple, yet powerfull way to extend getpass without adding any other module in the stdlib. And maybe adding a setpass in there too would make sense.

Update from python-ideas

So I brought up the idea in the mailing lists and it turns out (thanks to the folks at Python-ideas) that the way I want to introduce this feature is not good for these reasons :

  1. getpass is just a function that is used to get a password from the prompt. you can consider it as a potential, dummy backend for Keyring for example. Trying to make it extendable just denaturates its original purpose.
  2. the only use case right now in the stdlib is for Distutils, so it doesn’t really make sense to have a keyring in there. People can just use the Keyring project directly.
  3. Now if other parts of the stdlib have the same need, it will be time to think about how it could be included in the stdlib level rather than in Distutils.

So, I’ll work for its inclusion at Distutils level rather thah on getpass level.



Read moreComment

Simple command-line vault : CLVault

Mon, 01 Feb 2010

I am pretty happy with the Keyring project. I use it now everywhere in my Mercurial-based projects, thanks to mercurial_keyring.

There’s one other place I’ve started to use it: I needed a simple command-line based tool to save passwords and read them. The tool I’ve used so far was KeePass, but I need to run it then click on its UI. This is time consuming when I simply want to push a password in the clipboard to use it to unlock something.

So I’ve wrote these two very simple scripts that use Keyring to store and retrieve passwords

$ clvault-set blog
Set your password:
Password set.

$ clvault-get blog
The password has been copied in your clipboard

The code that copy the passwords in the clipboard was tested under Mac OS with its Keychain, but should work under Windows and Linux as well.

I think these scripts can be useful for people like me who spend most of their time in a bash prompt when they are not in Vim or Emacs. So I created a project called CLVault.. You can grab it at the PyPI: CLVault PyPI page

or install it like this with Pip:

$ pip install clvault

Let me know if it’s useful to you !



Read moreComment

Pycon packaging sprint topics

Mon, 11 Jan 2010

Pycon is coming soon. Here’s a list of possible topics I would like to work on during the sprint:

  1. adding the features in Distutils I’ve mentioned in my earlier post.
  2. work on the standalone release of Distutils, and make sure it works with 2.4, 2.5, etc so it can be distributed at PyPI. There are already installable nightly builds by the way.
  3. Finish the buildbot work so Distutils is tested with more projects from PyPI
  4. Continue the work on Distribute, and specifically the work on 0.7:
    1. finish the develop command that would work with non-eggs formats
    2. finish the configure/build/install command where all options are computed and saved by the configure command
  5. Fix plenty of issues in the tracker
  6. work on a geolocalisation feature for Pip, so the nearest mirror can be picked

Anyone interested in packaging sprinting at Pycon ? Let me know !



Read moreComment

Fixing packaging terminology confusion

Thu, 07 Jan 2010

Edit: the discussion is still going on, so I’ve probably blogged that a little bit early (I was excited about it ;) ). Stay tuned for the final output.

Brad Allen launched a thread in Distutils-SIG about packaging terminology confusion. In particular the usage of the word “package” in our community. Part of the confusion is because of the meaning of this word in Python (that is, a directory containing one of several Python modules, with a special one named __init__.py) and in some systems like Debian (there, a package is a distribution file for a library or an application).

This confusion was present in PEP 345 (which was started years ago, so that explains it) – and is present in Distutils documentation and also in PyPI (That is: Python Package Index).

I really like Tres Seaver’s definitions, because they match prefectly the reality:

  • package means a Python package, (directory intended to be on sys.path, with an __init__.py.  We *never* mean a distributable or installable archive, except when “impedance matching” with folks who think in terms of operating system distributions.
  • distribution is such a distributable / installable archive: either in source form (an ’sdist’), or one of the binary forms (egg., etc.). Any distribution may contain multiple packages (or even no packages, in the case of standalone scripts).
  • project is the process / community which produces releases of a given set of software, identified by a name unique within PyPI’s namespace.  PyPI manages metadata about projects (names, owners) and their releases.  Every real project has at least one release.
  • release is a set of one or more distributions of a project, each sharing the same version.  Some PyPI metadata is specific to a release, rather than a project.  Every release has at least one distribution.

And I really like Martin’s proposal in the thread (in Catalog-SIG since it was cross-posted): “PyPI would then be the Python Project Index.”

I’ll fix Distutils documentation on my side accordingly, as well as the guide we are building. Let’s promote these definitions :)



Read moreComment

Possible new features for Distutils 2.7

Thu, 07 Jan 2010

While PEP 345 and PEP 386 are waiting for the final approval, I am back at work on Distutils code work, PEP 376, Distribute, and the HitchHicker guide to Packaging. The latter is growing faster than I have expected, thanks to the contributions of John Gabriele. It has quite some content already. I think the guide is an important task, and I’ll try to focus on it in this first trimester.

Distutils 2.7 new features

Python 2.7 first beta version is around the corner, and once it’s reached we can’t add new features. So, besides the code that will be changed if the PEPs we worked on at Distutils-SIG are accepted, here’s a list of small features I’d like to introduce in Distutils:

  • a test command, that just uses the new unittest discovery script to run unittest-compatible tests.
  • a new option for sdist called ‘extra_files’, that will allow to list extra files to be included in the distribution. These files will not be installed by ‘install’, just be part of the distribution. This will allow including files like CHANGELOG, etc.. without having to use a MANIFEST template.
  • a very basic pre/post commit hook for the install command. These hooks will be deactivated when any bdist_* command runs install to create the binary tree. Now for bdist_rpm own hooks, I guess the best way would be to make install consumes the same two options than bdist_rpm (pre-install, post-install) so a project will be able to define a hook that is used by RPM and/or python setup.py install

If you think about something that should be added in 2.7, speak up !



Read moreComment