By Ingeniweb. A Django site.
Février 23, 2009
» Raising Distutils test coverage : half-way


After the next commit I will make in Distutils (that adds tests for bdist_rpm), the test coverage of this Python standard library  package will be at 41%. This means that I have doubled the test coverage over the past few months, from 18% to 41%.

My goal is to double it again, and reach 80% in the next 6 months.

This also means I am just half an idiot now ! (since people who don’t have 100% code coverage are idiots ;) ).

So does it make Distutils more robust ?

It would have probably made the latest Python 3 release looks better for this package, since we had a uncovered cmp() call left in Distutils by the time the release was made. In the meantime, as I said before, the “real” Distutils regression test suite is held by all the packages out there in the community, that are built and installed everyday.

Python trunk Distutils test coverage : 41%

Name               Stmts   Exec  Cover
--------------------------------------
__init__               3      0     0%
archive_util          77     61    79%
bcppcompiler         185      0     0%
ccompiler            453    211    46%
cmd                  180    134    74%
config                73     59    80%
core                  93     50    53%
cygwinccompiler      161      0     0%
debug                  3      3   100%
dep_util              43     11    25%
dir_util             109     76    69%
dist                 581    386    66%
emxccompiler         118      0     0%
errors                49      0     0%
extension             97     28    28%
fancy_getopt         233    126    54%
file_util            124     77    62%
filelist             161    102    63%
log                   46     21    45%
msvc9compiler        408      0     0%
msvccompiler         370      0     0%
spawn                 93     28    30%
sysconfig            323     51    15%
text_file            112     61    54%
unixccompiler        160     64    40%
util                 255    157    61%
version               68     62    91%
versionpredicate      61     51    83%
__init__               3      3   100%
bdist                 61     35    57%
bdist_dumb            57     47    82%
bdist_msi            322      0     0%
bdist_rpm            252    198    78%
bdist_wininst        170      0     0%
build                 60     54    90%
build_clib            90      0     0%
build_ext            334    160    47%
build_py             213    178    83%
build_scripts         78     65    83%
clean                 35      0     0%
config               185      0     0%
install              251    156    62%
install_data          44      0     0%
install_egg_info      40     32    80%
install_headers       25      0     0%
install_lib           97     50    51%
install_scripts       33     29    87%
register             173     82    47%
sdist                228    180    78%
upload               112     38    33%
--------------------------------------
TOTAL               7502   3126    41%


Python 2.5.4 Distutils test coverage : 18%

Name               Stmts   Exec  Cover
--------------------------------------
__init__               3      0     0%
archive_util          78     11    14%
bcppcompiler         185      0     0%
ccompiler            453      0     0%
cmd                  180     79    43%
core                  93     15    16%
cygwinccompiler      160      0     0%
debug                  3      3   100%
dep_util              43      4     9%
dir_util             106     50    47%
dist                 578    342    59%
emxccompiler         118      0     0%
errors                49      0     0%
extension             97      9     9%
fancy_getopt         233    121    51%
file_util            121     50    41%
filelist             162      0     0%
log                   46     15    32%
msvccompiler         365      0     0%
mwerkscompiler       140      0     0%
spawn                 93      0     0%
sysconfig            296     10     3%
text_file            146      0     0%
unixccompiler        159      0     0%
util                 235     69    29%
version               68     48    70%
versionpredicate      61     51    83%
__init__               3      3   100%
bdist                 59      0     0%
bdist_dumb            57      0     0%
bdist_msi            320      0     0%
bdist_rpm            248      0     0%
bdist_wininst        159      0     0%
build                 52     47    90%
build_clib            90      0     0%
build_ext            304      0     0%
build_py             213    143    67%
build_scripts         78     64    82%
clean                 35      0     0%
config               185      0     0%
install              220    120    54%
install_data          44      0     0%
install_egg_info      40      0     0%
install_headers       26      0     0%
install_lib           96      0     0%
install_scripts       33     29    87%
register             171      0     0%
sdist                204      0     0%
upload               118      0     0%
--------------------------------------
TOTAL               7026   1283    18%

Février 15, 2009
» What’s new in Distutils ?


Since Python 3.0.1 was released this week, here’s a quick wrapup of what is going on in Distutils.

Code work (since one month)

New features

  • Issue 2563 : now the manifest is embed in windows extensions
  • Issue 4394 : the storage of the password in .pypirc file is optional now

Fixed bugs

  • Issue 4524: distutils was failing to build scripts with the ‘–with-suffix=3′
  • Issue 5132 : build_ext command was failing under Solaris with ‘–enabled-shared’
  • Issue 5075 : bdist_wininst was depending on the vc runtime

Refactoring

  • Issue 2461 : added test coverage for util.py
  • Issue 3986 : removed  string and type usage from distutils.cmd
  • Issue 3987 : removed type usage from distutils.core

Documentation

  • Issue 5158 : added documentation for depends option for extensions
  • Issue 4987: updated README info
  • Issue 4137 : SIG web pages were updated

Design work

The main topics that are being discussed are:

  • Improving the console script story. thread starts here.
  • Publishing a Survey on Distutils before the Language Summit. thread starts here.
  • Adding an uninstall command to unistall packages in the sdtlib : thread starts here
  • Adding a get_metadata API in pkgutil to get the metadata of an arbritary package The same code can also be used for various lookups (uninstall, console script) : thread starts here
  • Having a new PEP to make egg.info a directory and clearly define PKG-INFO and its match with metdata. thread starts here

Février 8, 2009
» A Distutils Regression Test System ?


I am making some progress in Distutils. I closed something like 10 bugs last week, and I am reaching issues that were added 8 months ago. Not that everything is entirely cleaned up in the newest issues, but they’re almost all being processed. Every commit comes with at least a test, to get the code base back into a state were it is easier to make things evolve without the risk of breaking it up.

It comes through tiny little changes, with tests and an eye on the coverage.

Now I am facing an unpleasant situation : since the test coverage is still low, I am always scared of breaking something in Distutils when I am fixing a bug or making a change.Buildbots are watching, and I run some of my own packaging work with the current trunk.

But still, this is an unpleasant situation, and I don’t want to cause the package to be broken in the next Python version…

But the regression tests exists ! They are there, hidden, in the community. It’s everyone package.

  1. Joe adds an issue in the Python bug tracker, because Distutils didn’t work as expected on his package because of a bug
  2. At some point the bug is (was) fixed.
  3. The test to make sure the bug is fixed is “Joe is running Distutils over his package again, and makes sure it is properly installed, compiled, etc”.
  4. The bug is closed.

So how can I get back this test to make sure Joe’s package is still working properly, so he doesn’t hate us at the next major Python release ?

A Distutils Regression Test Server

If Joe’s package is on PyPI, we can set something up. A dedicated server that watches the PyPI changelog and triggers a buildbot when:

  • a new release of Joe’s Package comes out
  • we change something in Distutils code

The precise test to be run is still unclear to me but, I am thinking about some generic strategies and I think it’s possible. Let’s call this test a distutils regression test. (If you have a better name, I’ll buy it)

Of course it doesn’t have to be on all the packages that are uploaded out there at PyPI. Just Joe’s one, because he came up with a problem we fixed. And we would be ashamed if the bug comes back on Joe’s package.

This requires of course a server, and probably a vmware-like system if Joe runs Windows or Solaris, to make buildbot slaves etc. It also requires that Joe uses the right metadata in his package so we know if it works under Python 2, Python 3, etc. MvL added enough classifiers lately for this.

A Distributed Distutils Regression Test

But some package are not on PyPI, for privacy or conveniency in the packaging process of the person in charge. So, what if the distutils regression test is provided in a Distutils command ? It can run the same test the server runs, and come up with a report that is sendable or sent by mail to a special mailing list or so.

This supposes that the developer is cooperative. So maybe it can even be automatically triggered in case of any failure on any Distutils command, and ask the user if he would like to send a report ?

The good thing here is that it doesn’t require CPU power on the test server, and that anyone can run that test.

So what ?

Well I am just throwing an idea here, because I am really concerned about the potential regression problems. Even if Distutils is 100% covered with tests, it’s not possible to test all combinations. The real world environment is the only test that can be trusted at the end in the packaging area.

I’ll throw this idea at the Language Summit in March, and if it catches people interest, maybe a Google Summer of Code task could be done for that topic ?  Can’t implement it myself, I am overwhelmed already in Distutils maintenance :D

Just out of curiosity, how do *you* test your packages to make sure they get installed correctly ?


Janvier 28, 2009
» Building a survey for Distutils (Pycon’s Python Language Summit)


I have the opportunity to lead a session at the Python Language Summit at Pycon.

In order to prepare this event, I am currently building a survey because we need to know how people use Distutils, what is wrong with it, and so on…

The draft is here : http://wiki.python.org/moin/Packaging%20Survey

When it’s ready (hopefully soon), I’ll post it in a SurveyMonkey-like system online for people to take it, and synthetize the results, so what I am saying at the summit reflects hopefully the reality.

If you are a Python developer, an OS packager, come to the Distutils-SIG mailing list and help us build the best survey possible !

(while the draft is in a wiki, it’s better to discuss the changes in the mailing list before it is applied)

Janvier 23, 2009
» Singletons (and Borg) are unpythonic (well.. imvho)


Alex Martelli wrote a review about my latest book (can’t find a permanent link on this, just look at Amazon.com you’ll find it).

Amongst the negative parts there’s one noticeable part I’d like to discuss in my blog, because I disagree with Alex’s analysis.

Alex says:

This also holds for the chapter on design patterns, with such egregious claims as “Singletons should not have several levels of inheritance” — they should have as few as practical and feasible, *exactly like any other class*; the desire to limit the number of distinct instances (which is mostly about STATE) is quite orthogonal to the issues with subclassing (which is mostly about BEHAVIOR). From this original “totally missing the point” follows a classic howler (which I’ve seen repeated in a review above): “why not use a module?”. I have news for you, Tarek: a module supports *ZERO* inheritance — which is quite a bit stricter than even the unjustified “should not have several level” claim above. Having to completely give up the usefulness of inheritance just because you want to limit instantiation would be a very limiting engineering tradeoff! If there’s no need for inheritance then *of course* you want to use a module - DOH! - but if there IS (or if special methods can really help you) then it’s not an option.

I think that the Singleton (and Borg) pattern is totally useless in fact. That’s not the philosophy of Python in my humble opinion. And I think my book is right to advise people not to use this pattern.

I don’t see the point of bending down a class so it only has one instance, where you can simply create an instance of that class in a module, add a “_” prefix to that class, and tell the world that this instance is your singleton. I don’t see why a class should deal with that kind of STATE.

Frankly, I doubt that this singleton/borg pattern is really used in the community.

The only place where I really had to use singleton classes was in Zope. But that was more like a marker than anything else, and the class was registered under a “single” name in a global mapping (eg. its id in its container, in the ZODB tree). And in that case, we were creating one mixin class that used a singleton class, and we called it a “tool”, with all the desired BEHAVIOR inside of it.

And well, if we would had several instance of it, for sure nothing bad could really happen, because the real unicity was provided by the id of the object. And nowadays those “tools” are going away and they are now called “utilities”, and I don’t think any singleton class is still really present or used. Just simply because a class is not the right place to enforce this.

I’d also say that there’s an architectural problem when you enforce things like this in Python. If a programmer tells me that he wants to use a Singleton on his class because it holds a DB connector he wants to be instanciated once in his application, I am asking him right away to review the way the program is structured.

So what is the closest element in Python that will let you mark an object as unique ? what is the most convenient way to mark an object with an id in a container ?

A simple variable in a module. (or a simple declaration in a zcml file if you are a zopish guy)

I love Python for this because it’s multiparadigm unlike Java : you don’t have to set up over-engineered OOP stuff for this kind of needs (and it is surely not a tradeoff to use well engineered OOP besides).

Last, when I am claiming in Frenglish, that “Singletons should not have several levels of inheritance”. This is just to warn people that, since these patterns are trying to break the way classes work, you might get screwed at some point when singletons are subclassed. A descriptor or a metaclass or whatever can just break your singleton stuff because Python was not meant to be used like that. It’s not robust.

Janvier 22, 2009
» Blog title changed - Fetchez le Python mes amis


I have too many search hits about Carpet Pythons on my blog, and too many book reviews on how bad my English is.

Welcome to “Fetchez le Python”, the technical blog on Python programming language, in a pure Frenglish style !

Janvier 20, 2009
» Python standard lib : give me more withs !


I used to write in files like this :

open('somefile', 'w').write(content)

It’s ugly for sure, and a more proper way is :

f = open('somefile', 'w')
try:
    f.write(content)
finally:
    f.close()

But since Python 2.6, the with statement is superior for this code pattern:

with open('somefile', 'w') as f:
    f.write(content)

This is so natural in fact that I am always thinking about with when I work with classes that have a start/stop or open/close behavior.

So, what about adding this behavior into imaplib.IMAP4, ftplib.FTP and smtplib.SMTP ?

So we can write things like this :

    >>> from ftplib import FTP
    >>> with FTP('ftp.somewhere.com') as ftp:
    ...     ftp.login('someone', 'pass')
    ...     (some code)
    ...

I am working on a series of patches for this, and wondering if some other classes in the standard library could benefit from this as well..

Janvier 9, 2009
» Distutils : improved .pypirc for Python 2.7 and 3.1


When you launch such a command:

$ python setup.py register sdist upload

There’s no way to give to Distutils your PyPI password in the prompt, so you distribution is uploaded to the server. You have to store your password in the .pypirc file:

    [distutils]
    index-servers =
        pypi

    [pypi]
    username: <username>
    password: <password>

The password is stored in clear text, so it can be used by Distutils to authenticate. This is rather unsecure, since anyone who has a read access to your home can get your password.

I have detected this problem this summer while listing the possible enhancements in Distutils. Nathan Van Gheem sent me a mail a month ago to ask for that same feature in collective.dist; which is a port of the latest Distutils features into Python 2.4 so Zope can use them. So before having it into collective.dist, the first step was to introduce it into Python itself.

The idea is to be able to remove from .pypirc the password so it’s asked at the prompt. Nothing fancy here : the Distribution object that is created before you launch any command is the place where you can share a context between commands.

So when you launch:

$ python setup.py register sdist upload

Here’s what is happening:

  1. register looks into .pypirc, if no password is found, it asks it to the user using getpass
  2. register use it then store it in the Distribution instance
  3. upload look into the Distribution instance to see if the password was stored, and use it

This is now available in Python 2.7 and 3.1, and heavily tested.

I’d like to go further and to think about a ssh-agent like system, so there’s no need to enter the pasword everytime you work with PyPI in the same session.

Does anyone knows what would be the way to do it properly ? I think a ssh-agent like mechanism in Python’s getpass would be a great feature itself.

Janvier 4, 2009
» 2009 plans, part #1 : Distutils


Happy New year all !

I am going to make a few posts on the things I would like to achieve in 2009. Each entry will focus on a topic. This one is about Distutils.

I was granted a commit privilege in Python, specifically to work on Distutils maintenance. This is a huge privilege, and I try will do my best in this job. I have worked on a few tickets already and closed some. I learnt the Python development process, which requires to backport and to forward-port changesets in various Python versions. While this can be taken care of automatically by someone else if you don’t do it, it’s better that every commiter takes the time to merge his own work.

So what’s next ?

  • There are 132 tickets that are open in the Python tracker, that match the word distutils, and some of them are 5 years old !
  • There’s a Python language summit to be held in Chicago right before Pycon, and I volunteered to champion the task about Distutils, PyPI and packaging matters.

I am planning to :

  • review and classify all the tickets in the tracker;
  • fix the maximum amount of them before the summit;
  • make Distutils a first class citizen in test coverage;
  • make Distutils code more modern.

Besides, I will try to build a roadmap for Distutils I will present in Chicago.

To build this roadmap, I will ask for input in the distutils-SIG mailing list in the coming days, and see what people will come up with. There’s no crowd in this list these days but sometimes some threads are hot when it comes to the future of packaging in Python.

The roadmap I am planning to build will not address all the issues people have when it comes to distribute a Python application, since there is no consensus yet on the best practices. It will rather try to see if the current version of Distutils can be enhanced to adress some problems, and at least be the bridge to something new in the future. Maybe by including some best practices from third-party tools (the pre-condition for all of this imho is to make the Distutils code base healthier).

Anyway, I hope that the lead developers of: zc.buildout, pip, setuptools, paver (and those projects I forget about right now) will  participate in this discussion, and that we will be able to find pragmatic enhancements.

Décembre 16, 2008
» Pycon 2009 talks


I have 2 accepted talks at Pycon, that is great. I would like to say that the Pycon review system is awesome because you can see what the reviewers have said, and understand why your talk was accepted or declined.

I was a bit frustrated that my Atomisator talk was declined, but I think it makes sense : this is a new tool, and beside my user group and a few people, it is not really used yet.

One reviewer said that it had to be picked, and another one answered :

I agree that PyCon should not restrict itself to well-known projects, but it should definitely restrict itself to projects that are (a) in production use, (b) under active development, and (c) likely to still be so in a year. There are so many projects meeting these criteria that for me, the bar is very high indeed to spend a talk slot on one that does not.

Ok, fair enough : I will present this talk at Pycon 2010 and they won’t have any argument to decline it ;)

The talks that made it:

  • How AlterWay releases web applications using zc.buildout
  • On the importance of PyPI in delivering and building Python softwares - mirroring, fail-over and third-party package indexes

I will get into greater details later on.

Décembre 15, 2008
» Python Isolated Environment (PIE)


Here’s a proposal I will send to the python-dev. What do you think ?

(Disclaimer : this proposal is highly inspired from the work done by people in various tools, it does not reinvent anything)

The problem

Python developers distribute and deploy their packages using myriads of dependencies. Some of them are not yet available as official OS python packages.  Even sometimes one package conflicts with the official version of a package installed in a given OS.

In any case, the cycle of development of most Python applications is shorter than the release cycle of Linux distributions, so it is impossible for application Foo to wait that Bar 5.6 is officialy available in Debian 4.x.

Therefore, there’s a need to provide or describe a specific list of dependencies for their application to work.

And this list of dependency might conflict with the existing list of packages installed in Python. In other words, even if this is not a wanted behavior from an os packager point of view, an application might need to provide its own execution context.

Right now, when Python is loaded, it uses the site module to browse the site-packages directory to populate the path with packages it find there.  .pth files are also parsed to provide extra paths.

Python 2.6 has introduced per-user site-packages directory, where you can define an extra directory, which is added in the path like the central one.

But both will append new paths to the environment without any rule of exclusion or version checking.

The workarounds

A few workarounds exist to be able to express what packages (and version) an application needs to run, or to set up an isolated environment for it:

  • setuptools provides the install_requires mechanism where you can define dependencies directly inside the package, as a new metadata. It also provides a way to install two different versions of one package and let you pick by code or when the program starts, which one you want to activate.
  • virtualenv will let you create an isolated Python environment, where you can define your own site-packages. This allows you to make sure you are not conflicting with a incompatible version of a given package.
  • zc.buildout relies on setuptools and provides an isolated environment a bit similar in some aspects to virtualenv.
  • pip provides a way to describe requirements in a file, which can be used to define bundles, which are very similar to what zc.buildout provides.

But they all aim at the same goal : define a specific execution context for a specific application, and declare dependencies with no respect to other applications or to the OS environment.

This proposal describes a solution that can be added to Python to provide that feature.

The solution

A isolated environment file that describes dependencies is added. This file can be tweaked by the application packager, or later by the OS packager if something goes wrong.

The isolated environment file

A new file called a  Python Isolated Environment file (PIE file) can be provided by any  application to define the list of dependencies and their versions.

It is a simple text file with a first line that provides :

  • a list of paths, separated by ‘:’, on line 1
  • then one package per line, starting at line 2. each package can be prefixed by a `!`

For example:

/var/myapp/myenv
lxml
sqlite
sqlalchemy
!sqlobject

This list of packages might or might not be installed in Python.

Versions can be provided as well in this file :

/var/myapp/myenv:/var/myapp/myenv2
lxml >= 0.9
sqlite > 1.8
sqlalchemy == 0.7
!sqlobject == 0.6

The file is saved with the pie extension,

Loading an isolated environment file

A new function called load_isolated_environment is added in site.py, that let you load a PIE file.

Loading a PIE file means:

  • for each package defined, starting at line 2, load_isolated_environment will look into the environment if the package with the particular version exists. The version is given by the package.__version__ value or the PKG-INFO one when available. If the package exists but the version is not available, the version 0.0 is used.
  • for packages without the ! prefix:
    • if the  package is not found, it will scan each path provided on line 1 of the file, using the site-packages method, looking for that package.
    • if the package is found, it is added in the path.
    • if the package is not found, a PackageMissing error is raised.
  • for packages starting with the ! prefix:
    • if the  package is found, it is removed from the path

This function can be called by code like this:

>>> from site import load_isolated_environment
>>> load_isolated_environment('/path/to/context.pie')

From there, sys.path meets the requirements and the code that is executed after this call will benefit from this context.
Another context can be loaded in the same process :

>>> load_isolated_environment('/path/to/another_context.pie')

Limitations:

  • if the new context brakes other programs in the process. It’s up to the application packager to fix the context file.
  • it’s not the job of load_isolated_environment to resolve dependencies issues : if the foo package needs the bar package, it won’t complain.
  • it is not the job of load_isolated_environment to get missing dependencies.

Using an isolated environment file

Typically, an isolated environment file can be used into high-level Python scripts. For example, any script an application provides to be launched :

# this script runs zope
from site import load_isolated_environment
load_isolated_environment('zope-3.4.pie')

import zope

if __name__ == '__main__':
    zope.run()

Décembre 14, 2008
» Looking for beta testers for Atomisator


I am looking for beta testers, interested in customized rss feeds or email alerts experimentations.

Here’s a list of services Atomisator can provide :

  • You run a project and you would like to receive a daily summary in your mailbox on what is being said about it in blogs, tweets, etc
  • You have a list of feeds you want to aggregate, with specific filters and you can’t manage to do with Yahoo pipes or any tools out ther, because it is too specific.
  • You want to annotate entries in a feed with extra information
  • etc..

What you get as a beta-tester:

  • a custom Atomisator configuration that fills your needs
  • I am hosting the service, and you get
    • either an url on my server to an xml file you can read in your aggregator
    • either one mail per day

What you are not getting as a beta tester:

  • you don’t get any guarantee on the output or the reliability, these are just experimentations.
  • if it’s down I can’t promise when it will be up again

Let me know by mail if you are interested

      

Décembre 12, 2008
» Pycon 2009 proposals


The proposal acceptance date is in a few days.

Here are the four proposals I have made:

  • The state of packaging in Python. This discussion resumes the current options when it comes to distribute your packages. It also explains the pitfalls and the gap between the Python developers and the OS vendors and packagers. I think this talk will not be picked because the topic is wide and vague. So I proposed to transform it into a panel where lead developers from various framework could explain their usage of distutils and what is missing to make them happy. No feedback yet on this.
  • Atomisator, the agile data processing framework. This tool is starting to be useful, and I think it can be useful to others. Check http://atomisator.ziade.org for a quick overview.
  • How AlterWay releases web applications using zc.buildout. That is the same talk I gave at the Plone conf but I present it in a way people understand zc.buildout is not tied to Zope and Plone and can be used with any other application. As a matter of fact, it has become a standard here, and we use it for Pylons, etc..
  • On the importance of PyPI in delivering and building Python softwares - mirroring, fail-over and third-party package indexes. That’s a long title. It presents my work on PyPI.

Last, I will go to the Python Language Summit the day before Pycon. I volunteered to be a “champion” on distutils matters.

      

Décembre 9, 2008
» How to make binary distribution of buildouts


The Problem

I need to distribute pre-compiled buildouts because some projects don’t allow us to have gcc installed on the production system for security reasons.

Fair enough, we need to provide a pre-compiled buildout.

If you want to distribute your buildout-based Plone application in a binary form, so it can be installed without requiring any compiler on the platform, you need to compile all .c modules before you provide a tarball of your buildout folder.

This is easy : just run your buildout and all .so files will be created in the zope 2 installation. (.pyd under windows)

But this will work only if you compile in a directory that is located within the same path on the target machine, because zc.buildout uses absolute paths when it builds scripts.

Furthermore, if the python interpreter is not located in the same place, your buildout script itself is screwed.

Last but not least, plone.recipe.zope2install is not clever enough. It will remove your zope2 installation when it detects that the path has changed. This is pretty annoying even if you have gcc : what is the point of compiling the c extension again since they
are statically compiled in-place ?

The solution

I have changed plone.recipe.zope2install and added a new option called `smart-recompile` (in trunk right now, not released).

If you use it, the recipe will check for .so or .pyd files before trying to ditch your zope 2 installation and recompile it. Even if you don’t use it to build binary distributions, it will make your buildout build faster if you already have zope compiled in there.

Next, I have created a special bootstrap.py, who is clever enough to rebuild the buildout script with the right path to the used interpreter, and with offline-mode capabilities. To make it short : boostrap.py works no matter if you have an internet connection or not. Grab it here : http://ziade.org/bootstrap.py

So now, basically you can compile your buildout and deploy it on any system, on any path, without any internet connection, like this:

$ python bootstrap.py    # will rebuild the buildout script   
$ bin/buildout

Of course this doesn’t work if you have dynamically compiled extensions like python-ldap. For theses, the best pick is to rely on the system ones.

      

Décembre 7, 2008
» A PostRank plugin for Atomisator


Yesterday, I bumped into PostRank. This system is collecting data from various social systems like Twitter and provides a service where you can type in an url of a blog post or a entire blog. You get a PostRank depending on the popularity of the URL.

I wrote a plugin for Atomisator and ran it on my own blog. Here’s the result:  http://ziade.org/afpy/

And the Atomisator configuration for this is :

[atomisator]
sources =
    rss http://tarekziade.wordpress.com/feed/atom/

database = sqlite:///carpet.db

outputs =
    rss  public/rss.xml "http://tarekziade.wordpress.com/feed/atom/" "Carpet Python with PR" "Powered by Atomisator"

enhancers =
    postrank

How PostRank works

PostRank works with urls you provide, on their web interface or through their web services.

As long as these url are present in their big cloud-computing based system, they provide a rank that is calculated with the number of comments related to the blog, the number of tweet messages that refers to it, and so on. The complete algorithm they used is secret but this is not the point. I have secret algorithms too ;) .

The point is that they are trying to categorize blog entries using social networks as indicators, and that they have a huge database.

Social indicators in Atomisator

This is one of the approach I have with Atomisator, when it is used to build a planet. For instance I have a Digg plugin that will inject in each entry the comments found on Digg if the entry was digged. It also present the number of Digg. Of course this is done live because I don’t have a cloud-computing based system where I store data. I use Digg webservice on the fly. (On the fly here doesn’t mean Atomisator make the calls to Digg from the Planet application of course. It means Atomisator calls them when it creates the merged feed on the system)

The benefit of this approach is that I can provide a social indicator on a post immediatly. Systems like PostRank will not work on entries that are too recent because their spiders have a lag of one week or so.

The pitfall of my approach is that I am unable to calculate trends because I don’t store the indicators as they vary.

But if someone wanted to build a BtoC application using Atomisator, they could implement a set of plugins based on Amazon tools to make them store data in a more scalable way and in time.

Next steps

So I have this new PostRank plugin, and this is awesome because I have added a treshold parameter in it. Basically if a post has a high PostRank value, it will appear in the Planet. If it’s low, it can be automatically removed. The fact that PostRanks are lagging for new entries is not a problem: interesting posts will eventually pop after a few days in the Planet.

This is perfect to reduce the number of entries in an aggregator.

But I do want to write my own PostRank that works live, with no storage at all. Because the whole point of Atomisator is to provide a framework where anyone can try out various filtering combinations.

So to be able to provide this power, it needs to work just by collecting data directly from the social services, like the PostRank plugin does with this PostRank “meta-service”. The next step is therefore to see if I can query services like Twitter to list the twits related to an url, without having to store the twitter feed myself.

In any case, if my talk on Atomisator at Pycon 2009 is selected, the PostRank plugin will be shown besides the Digg plugin.

      

Novembre 27, 2008
» Expert Python Programming Book : typo sprint tonight !


I love Packt. As soon as I have told them that some people liked the book but complained about the typos, they proposed to go ahead and launch a new print cycle.

Basically it means that the next buyers will have a typo-free book. At least for all the typos that were reported on my Trac here, or at Packt’s.

I am currently processing all the typos reported at Packt so I have a full list on my wiki, and will provide them the final list tomorrow.

Then, they will re-print it.

So if you already own the book, and you see a typo that is not listed, please let me know.

      

Novembre 26, 2008
» Python package distribution - my current work


I found a bit of time to work on distribution matters. Here’s a status of what I am doing there.

There are two topics I am focusing on right now.

  • clean up and enhance Python’s distutils package
  • implement the mirroring infrastructure at PyPI

distutils work

Nathan Van Gheem proposed a cool patch in collective.dist, (this package is a port of the new features I have added in distutils so they are available in 2.4 and 2.5).

Nathan proposed a patch to be able to avoid the storage of the password in the .pypirc file. The prompt is used in that case. This is something that was in my pile for a long time.

I have added a few things to Nathan’s patch, and a test, and proposed it to Python. I am now waiting for its integration in 2.7 trunk: http://bugs.python.org/issue4394. If it’s accepted, I will backport it to collective.dist.

There are some other tickets I am waiting to be accepted:

I am not sure when those will be integrated. The average time for the integration of tickets in distutils in Python is between 6 months and 8 months. hihihi. :D

PyPI mirroring

The job I am doing in PyPI will be in three phase :

  • Phase 1: implement the mirroring infrastructure in PyPI
  • Phase 2: promote it, and propose patches for the mirroring tools out there so they use the protocol
  • Phase 3: promote and propose patches for pip so it can use the mirrors efficiently (fail-over and nearest mirror infrastructure).

Phase 1: so far, so good.

With some insights from Richard Jones and Martin von Löwis, I am currently implementing the mirroring infrastructure for PyPI we have defined during the D.C. sprint (I still owe a blog entry about this sprint). The code lives in a branch on the python svn folder dedicated to PyPI.

The idea of the mirroring infrastructure is to be able to get a list of official mirrors for PyPI, that can be used as alternatives sources . (It is described here: http://wiki.python.org/moin/PEP_374). A great behavior could be that the client application interacts with the nearest mirror location automatically, and switch to another if it goes down.

So, a list of mirrors will be made available at /mirrors, and the client applications will be able from there to use an alternative location for every package. The hardest part concerns the stats : we want to display in PyPI the download counts for each package by summing downloads from every mirror.

So every mirror will have to provide its “local stats” that can be visited by PyPI. That’s the biggest part of the work I am doing. It will build the stats for PyPI by parsing its Apache log file. And hopefully, this code should be reusable by the mirrors themselve so they can build their stats the same way.

Of course this infrastructure could be used for any PyPI-compatible server even if is not a mirror of PyPI (like a private PyPI server)

Phase 2 will consist in promoting the infrastructure to the mirroring softwares out there. Maybe Pycon will be a good place for that.

Phase 3 is the most interesting one : make sure the client applications use the mirrors ! I think Ian Bicking’s pip project could be the right place for these innovations.

Next topics in the pile:

  • index-merging: describe in a PEP-like document the index-merging feature that would allow clients to merge several indexes with a content that differe. For example: PyPI + a private PyPI server. I have written a first draft of such a patch in setuptools in the past (http://bugs.python.org/setuptools/issue32) but I have lost all my hopes to see this project moving forward lately.
  • Brainstorming: try to understand the Python Packaging Paradox. That is = how come the community, which is composed of many briliant people, is unable to move forward in packaging matters.
  • Distribute the return :D
      

Novembre 22, 2008
» How to be disappointed with the “printed” in “printed book”


I feel really bad about this comment on my book : How To Be Dissappointed in Something You Recommend.

Just a quick word about the try, return finally code pattern, since I had some feedback about it. I would like to mention that this code pattern is perfectly right:

def function():
    try:
      return something
    finally:
      do something

I should have explained it better, because this pattern is not used a lot by people, so you can think that “do something” is called after the return of the function, which is not the case.

For the typos now:

The first thing I did wrong: when I started the book, I wanted, as I did in my previous book, to run unit tests on the book itself to avoid those mistakes. That said, the previous one was in Latex, which is quite simple to interact with, and this one is in OpenOffice, because that is how the editor works. I had to write a script to extract the Python code from the Ooo file, to unit test it. I didn’t. I simply ran out of time, as usual when you have deadlines on books.

The second thing I did wrong: I should have told the editor to wait a bit, I didn’t.

But Packt does Print On Demand, so I know that the Errata page I am maintaining here : http://atomisator.ziade.org/wiki/Errata, is being processed by the editor, and that the typos will be removed from the book at some point, without having to wait for a second edition.

I’ll update this blog entry as soon as I know the status on this.

I am really sorry Calvin, and all the people that are suffering from these typos.

      

Novembre 9, 2008
» How to receive email alerts when someone talks about something - 6 steps tutorial using Atomisator


I like Google Alert, the idea of receiving a mail every day that summarizes all articles related to a given topic is really helpfull when you need to focus on a specific subject for a while.

But this is not enough. I want to receive a mail that points me to any mailing list or planet feed or blogs out there as well, that talks about the topic.

You can’t do it with Google Alerts as far as I know.

Let’s take an example:

I want to receive a daily mail that points me to any mail thread or blog entry, that is related to the word “buildout” or to the word “pycon”.

Basically, to do it manually, I need to read Planet Python, Planet Zope, then take a look at the Python, Zope and Plone mailing lists. It takes at least 10 minutes, and more if you want to read all entries to make sure you won’t miss anything.

Since online systems like Nabble provides RSS feed for mailing lists (don’t find yours ? just add it there !), it is easy to read them as they where regular feeds.

From there, a script that reads all the selected feeds and sends a mail pointing to the entries that match the selected words is simple to write as well, and fill the need.

But don’t code it : Atomisator will let you do this with a few lines of configuration.

Here’s a step-by-step tutorial.

Step 1 - install easy_install

Step 2 - install Atomisator and SQLite

Step 3 - create an “atomisator.cfg” file

The content of the file has to be:

[atomisator]
store-entries = false

sources =
  rss http://www.nabble.com/Python---python-list-f2962.xml
  rss http://n2.nabble.com/Plone-f293351.xml
  rss http://www.nabble.com/Zope---General-f6715.xml
  rss http://planet.python.org/rss10.xml
  rss http://www.zope.org/Planet/planet_rss10.xml
filters =
  buzzwords words.txt
outputs =
  email email.cfg

This file will look into Planet Python, Planet Zope and various mailing lists (Python, Plone, Zope). Of course you can add or remove feeds in the sources option.

Step 4 - Create the words.txt file

This file contains regular expressions, one per line, that will be used to match the entries. The file has to be saved besides atomisator.cfg.

For our example:

buildout
pycon

You can put any expression you want in this file, as long as you have one matching expression per line.

Step 5 - add an email.cfg configuration file.

This is where you define the target emails that will receive the alerts (tos option). You can also specify the from email, or the smtp server location. The file has to be saved besides atomisator.cfg.

In our case it can be:

[email]
tos = tarek@ziade.org
from = tarek@ziade.org
smtp_server = smtp.neuf.fr

Step 6 - Run it !

The command to be called is atomisator (installed by easy_install) followed by the configuration file:

$ atomisator atomisator.cfg
Reading data.
Launching worker for rss - ('http://www.nabble.com/Python---python-list-f2962.xml',)
Launching worker for rss - ('http://n2.nabble.com/Plone-f293351.xml',)
Launching worker for rss - ('http://www.nabble.com/Zope---General-f6715.xml',)
Launching worker for rss - ('http://planet.python.org/rss10.xml',)
Launching worker for rss - ('http://www.zope.org/Planet/planet_rss10.xml',)
Retrieving from rss - ('http://www.nabble.com/Python---python-list-f2962.xml',)
Retrieving from rss - ('http://www.nabble.com/Zope---General-f6715.xml',)
Retrieving from rss - ('http://n2.nabble.com/Plone-f293351.xml',)
Retrieving from rss - ('http://planet.python.org/rss10.xml',)
Retrieving from rss - ('http://www.zope.org/Planet/planet_rss10.xml',)
.................................................................................................................................................
Writing outputs.
Data ready.

Check your mails. This call can be put in a daily cron.

Tested under Mac OS X and Linux.

      

Novembre 6, 2008
» Plone Conference 2008 in Washington D.C. - summary


I am back from the Plone Conference in D.C., and the jetlag is gone. The jetlag is gone for weeks now but it’s hard to find the time to blog these days :/

On the talks I have seen and topics I have chatted about

There were a lot of great talks in D.C., and it was hard to decide which one to look at. In any case it was easy to meet the speaker if I had missed the talk, because the Plone Conference, unlike big conferences like OSCON, is a place where everyone hangs around the same spot after a talk is over.

Here’s a list of some topics I have seen or I have talked about with some people.

Deliverance - Ian Bicking

If you look at what Ian has produced in the past 5 years, he is one of the most prolific contributor of tools that become standards in the Python web development web community. Think about Python Paste or virtualenv, and many others. Deliverance might be the next big one.

Take a bunch of micro web applications you want to join to build a full web system, for historical reasons or just because you believe a particular feature just won’t fit in Plone but will do great in Pylons.

Now ask a designer to glue everything together under the same look. He (or the guy that integrates his design) will probably hates you: he will have to learn how to integrate in heterogeneous environments. This is easy under some systems that let you stick a layout and a css in a simple way. This is not easy under Plone, unless you learn how to do it (but this will be improved in the future).

Deliverance is a proxy that let you skin any application that spits html content, by running some XPATH rules on the content and applying some changes to produce a new output. Basically, you have a simple html page that just provides the layout you want to have, without any content, and a xml file that explains how to extract some content from the page produced by the third-party application and where to inject it in your empty html page. The great thing is that you can call different third-party servers given the path you are in, and even call several servers to build one single page. This opens a lot of perspectives.

The first caveat of this approach is that you have to provide a Single-Sign On feature to avoid people having to connect several times. This can be a problem sometimes with some applications if they are not open enough to let you do it. But most of the time, it is not a problem : if the users are all located in a LDAP it is easiy.

Furthermore, if you use only Python-based applications, you can use a WSGI envrionment and a middleware like repoze.who to glue together let’s say, a Plone app and a Pylons app. Products.oopas is the PAS plugin that can be used for that on Plone side to grab the authentication context and use it.

The second problem I can see is about response headers. One example: if a page is composed of elements that comes from several pages, and if the page has a Last-Modifier header, I don’t think Deliverance handles this correctly yet, to make sure to present the newest Last-Modified header from all third-party servers that where called to build that page. But this more likely to be a detail compared to the single authentication problem.

In any case this is a very promising tool !

Content Mirror - Kapil Thangavelu

I didn’t see that talk, but I have talked about this tool with a few people. The idea is to serialize the content of a Plone instance into a relational database (eg Postgresql), as it happens, using events.

I need to give a try and check it deeper, to see how the overhead is dealt, and how the aggregator I have read about is doing (it collects mirorring operations to perform in a transaction, and optimize the calls at the end of the transaction to avoid redudant calls if I understood correctly). I don’t know yet for example if there’s a pool of jobs for the mirroring tasks to avoid a point of failure. But I am pretty sure this is taking care of. The other point I need to see if there’s a round trip. e.g. if there’s a way to apply a relational database change back into Plone.

But in any case I can already see various use cases for my customers. For instance, having a plone instance as a back office, with complex workflows for editors and contributors, and a lightweight Pylons application as the front application, that concentrates into displaying the relational database as fast as possible, makes a lot of sense in big environments. It just scales better.

So this is a interesting tool as well.

repoze.bfg - Chris McDonough

Chris gave a talk about repoze.bfg, which is a new web framework that takes back the good bits from Zope and push them into a WSGI world, using the Pylons approach I would say. That is : “here’s the template engine you can use in repoze, but really, use the one you like”.

Frankly, I am really seeing this new effort as one of the most promising one in the Zope community. Already, repoze.auth is a major middleware in WSGI : Zope’s Pluggable Authentication Service outside Zope, usable with any WSGI application. This is a blast !

And people are starting to contribute a lot of interesting middlewares under the repoze namespace.

Now I didn’t really try repoze.bfg itself yet, but given the people that are behind it, I am pretty sure this framework will meet success in the future. Having a MVC framework ala Pylons that let you use Zope packages with a “this zope package is repoze/wsgi compliant” label on each one of them is very cool.

collective.indexing - Andreas Zeidler and al

At the snow sprint, we worked with the Enfold crew that did a great work in integrating the Solr/Lucene system so it can be used from Plone. We replaced a few fields like the searchable text and indexed it on Solr side, just to give it a try. The snow work was really focusing on providing a buildout, a few recipes and a bench to say : “Hey, Plone community, this is a blast ! let’s do more of it”

Later Andreas Zeidler and a few other guys continued the work on indexing matter and they delivered collective.indexing, which provides two things:

  • a queue that collects all indexing to be done, and optimize the call to the catalog
  • a bridge to use collective.solr

I didn’t follow the latest development and I didn’t know how far the guys went, but I had the chance to hang around with Andreas and Tom Lazar in D.C., so now I know that this package is production ready :D

So in other words : I’ll probably use it as a mandatory package for all the big plones out there.

The queuing part imho, should go into the catalog itself because there’s no other way to make sure a third-party product is not calling the catalog during the transaction wile another product does the same.

Server-Side Include (SSI)

Tom Lazar worked during the Snow Sprint on lovely.remoteinclude to make Plone portlets accessible via unique URLs. From there, it is possible to push a page that contains a list of urls rather than the calculated page, to a front server that knows how to read SSI directive, and builds the page.

This is great for performances, and is a lot like ESI (Edge Side Include) we use to have in CPSSkins.

I am wondering if both could be implemented in the same tool in fact.

Tom told me that he will try to continue this work at the performance sprint in Bristol in december, so let’s keep an eye on this !

I have seen many other talks and topics, but these few ones where the ones I really needed to talk about.

On the conference organization

I am helping in the organization of Pycon FR in Paris since 2 years now. I know what is means to organize such events : it is a LOT OF WORK.

You know when an event is well organized when you don’t feel it is organized.

That was the case in D.C. Bravo Alex, Amy and all the others !

The only problem (wifi) was not the organizers fault, and I have never been to any event where it is not cahotic at some point (besides OSCON) so… :)

On the community

I love you all guys. It is an amazing community.