A simple self-upgrade build pattern

Bootstrapping is useful.

For example, we use it to build in-place our applications at Mozilla Services, to develop them. When the developer runs "make build", a Python script is invoked, and makes sure all the Python packages we are developing are checked out locally, and tied to the execution environment.

In the Python community you have similar tools like zc.buildout combined with mrdeveloper, or Pip's requirements. Although, our script does Mozilla-specific things, like allowing to update the dependencies at specific tags via command-line options, or at the "latest stable tag".

Anyways, back to bootstrapping.

Having such scripts is great, but maintaining them across all projects is painful. Since the script is the first thing the build runs, it cannot have an external dependency so it has to be completely independent. So you have to copy it around, in all your projects, and that leads to maintenance hell.

One solution is to add a self-upgrade feature to the script, so it tries to get the latest version online before it is run. zc.buildout has such a self-update feature, that will make sure it's at the latest version. Although, the implementation of this feature is a bit odd. 1/ you cannot run the bootstrap script if you're offline. 2/ on upgrades, zc.buildout restarts itself. I don't think 2/ a robust approach and I recall that it had some bugs under Windows in the past (updating files that your own program is using might fail).

So, since I was working in the tools-for-Mozilla-Services area, I added a self-upgrade feature and tried to isolate a simple mechanism for this. A pattern that would let us publish fresh versions of some scripts in our server and make it easy for our tools to use them.

An efficient bootstrap script should:
- be able to run offline, so the app can be built even if it's unable to check for the latest online version - be able to upgrade itself with a newer version - upgrade itself without having to restart itself - be completely independent -- besides a vanilla Python

The pattern I have used is as follows:
- The bootstrap script is an empty shell that is just responsible to update a second script that contains the real bootstrapping code, then dynamically import it via an __import__, and run it. - The server that publishes the real bootstrap script returns an ETag header with it, and supports the If-None-Match header. - The bootstrap script is able to ask the server if the online version is newer, by using the ETag value it has from the previous update in the If-None-Match header. If the online version is unchanged, the client receives a 412 code (Precondition failed).

The code has no dependencies besides Python.

Extract from the client-side code:
import os

import sys

import urllib2



def main():

    # getting the file age

    if os.path.exists('._build.etag'):

        with open('._build.etag') as f:

            current_etag = f.read().strip()

        headers = {'If-None-Match': current_etag}

    else:

        headers = {}

        current_etag = None



    request = urllib2.Request('http://moz.ziade.org/_build.py',

                              headers=headers)



    # checking the last version on our server

    try:

        url = urllib2.urlopen(request, timeout=5)

        etag = url.headers.get('ETag')

    except urllib2.HTTPError, e:

        if e.getcode() != 412:

            raise

        # we're up-to-date (precondition failed)

        etag = current_etag

    except urllib2.URLError:

        # timeout error

        etag = None



    if etag is not None and current_etag != etag:

        # we need to update our version

        _rename('_build.py')

        content = url.read()

        with open('_build.py', 'w') as f:

            f.write(content)



        with open('._build.etag', 'w') as f:

            f.write(etag)



    # we're good, let's import the file and run it

    mod = __import__('_build')

    project_name = sys.argv[1]

    deps = [dep.strip() for dep in sys.argv[2].split(',')]

    mod.main(project_name, deps)



if __name__ == '__main__':

    main()

Extract from the server-side code:
import os

import mimetypes

from hashlib import md5

from wsgiref.simple_server import make_server



# where the served files are located ?

_DIR = '.'



def application(environ, start_response):

    if_none_match = environ.get('HTTP_IF_NONE_MATCH', '')

    path = environ['PATH_INFO']

    path = path.lstrip('/')



    if path.startswith('.') or '/' in path:

        # looks like a bad path -- or a potential security thread

        status = '400 Bad Request'

        headers = [('Content-type', 'text/plain')]

        start_response(status, headers)

        return "Invalid Request"



    path = os.path.join(_DIR, path)



    if not os.path.exists(path):

        # unknown file

        status = '404 Not Found'

        headers = [('Content-type', 'text/plain')]

        start_response(status, headers)

        return "File not found"



    # we're good, let's look at the file age

    etag = '"%s"' % md5(str(os.stat(path).st_mtime)).hexdigest()

    if etag == if_none_match:

        # same file

        status = '412 Precondition failed'

        headers = [('Content-type', 'text/plain')]

        start_response(status, headers)

        return ''



    content_type = mimetypes.guess_type(path)

    status = '200 OK'

    headers = [('Content-type', content_type[0]),

               ('ETag', etag)]



    start_response(status, headers)



    with open(path) as f:

        content = f.read()



    return content



if __name__ == '__main__':

    httpd = make_server('', 5000, application)

    print "Listening on port 5000...."

    httpd.serve_forever()

The server is a plain Python wsgi script that is easy to hook in Apache or Nginx, to serve a directory of files. It should be safe enough as long as the served directory is isolated from the rest of the system. I did not try that hard to secure it, but it makes sure you can't leave the directory.

EDIT: as noted in the comments, Nginx and Apache provides such ETag features, so you can just configure it to serve static files this way. I did a wsgi script as an initial prototype and to demonstrate what the server does, and I kept it because it's easy to run or hook into any web server. And I will probably add more features next there.

Now the bootstrap script we use can live in its own repository, and get bug fixes and features, that will be automatically propagated to all our applications.

Do you solve this issue differently ? I am curious to get some feedback on this.