I have been working for a bit more than a month now on the next
generation of the Firefox Sync server in Python and while the project is
still in its early stages and subject to a lot of changes, I think it's
a good idea to share now about what we are building here at Mozilla.
Maybe that'll attract contributors !
Firefox Sync (formerly Weave) let you synchronize your Firefox
bookmarks, history, passwords, opened tabs etc. so you can have them on
any computer, or even use them from your iPhone by using Firefox
Clients that are syncing work with our servers at Mozilla by using the
Sync and the User APIs defined in these documents:
- https://wiki.mozilla.org/Labs/Weave/User/1.0/API - https://wiki.mozilla.org/Labs/Weave/Sync/1.0/API
The User APIs manage the users accounts and tell the client which
server holds the data of a given user. In other words, each user is
tightly coupled to a single server when reading or writing data. This
natural sharding is great for scaling Sync, and is possible because
users don't share data (yet... ;))
Another important point is that the data are encrypted on client side
before they are sent over. That's because one of the key concept of Sync
is that your data should not be known by our servers, to protect your
privacy. Well, we could probably still know how many bookmarks you
have by counting the number of entries in the DB, or how often you use
your browser. But as soon as you use a service like that you have to
give away these kind of information, most of the time just because they
are useful to make the service faster or understand any potential
And the good news is that you can set up your own Sync server and
even implement it yourself if you want.
So, a Sync server a pretty passive storage server, that is quite easy
to scale while keeping data consistency across clients.
About the code
The current implementation uses Apache, PHP, LDAP, MySQL and Memcached.
For various reasons I won't detail in this post --that might be another
post-- , it has been decided to switch the Sync server to Python
The Sync server is composed of web services and a few screens used for
the password reset process, so using a web framework would have been
overkill. Although, writing a wsgi-enabled server made a lot of sense
since it allows people to run our implementation on their laptop, or on
any wsgi-compatible web server they wish to use.
So, I've picked :
- Routes, to dispatch requests to a few classes (controllers) - WebOb to process incoming requests and build responses - Paste. PasteScript, PasteDeploy, to group the configuration in an ini file and make it easy to run the application with a built-in server.
There are alternative routing systems, but Routes really fits my
brain and make the dispatching quite simple. I really like the fact that
you can optionally use regular expressions to validate URLs.
WebOb is quite a standard library and make our life simple to read
requests and write responses. The code in our controllers stays KISS
with WebOb when you have to read incoming data: they're all available in
simple mappings. The response is also built by WebOb and you can forget
about all the wsgi protocol details. We mainly return JSON dumps that
WebOb wraps into responses.
Last, Paste is very handy to run the server locally, to initialize
data, and handle multiple configurations. I should also say that my
colleagueIan Bicking is behind the Paste and WebOb libs, and
involved in the Sync project. So those were quite natural choices.
The authentication process is a custom function that reads a basic
authentication header and checks it using an authentication plugin (more
on plugins later in this post.)
For the storage, I've picked SQLAlchemy and python-ldap. I
don't really use the ORM part of SQLAlchemy and write pretty raw SQL
queries to avoid any extra overhead. The benefit of the ORM was null
here anyways, since all storage I/O are contained in a storage class
that outputs simple mappings. I have created the mappers though, as they
are useful to initialize a DB on a first run.
But when the server runs, SQLAlchemy is mainly used for:
- its connection pooling abilities. - the nice parameters binding - the ability to switch to any DB system via configuration (as long as the SQL is compatible of course)
As for python-ldap (I didn't implement the LDAP part yet), it's the
standard connector I have always used with various flavors of LDAP
servers (OpenLDAP, ActiveDirectories, etc.). I don't think there is any
competitor for this anyways.
The caching is currently done using Memcached. For instance, when
clients are often asking for specific collection items, they end up in
memcached to lower the number of queries made to MySQL. For the Python
implementation though, I've decided to use Redis instead.
In terms of speed, Redis and Memcached are quite similar. Redis though
has interesting extras:
- The data is saved to the disk, so you don't lose your cache. The speed stays almost the same as memcached since the disk syncs are done asynchronously from time to time. Since a Sync user is tightly coupled to a storage server, that's an interesting feature to have. And, hey, you can move data from a Redis DB to another, so migrating the cache to another server is even possible. - Redis provides built-in APIs to work with sets and lists, which authorizes more complex caching without extra code. This will allow us to do more caching in the future.
The storage itself will stay on MySQL but we will probably explore
alternative storages systems in the future. One requirement of Sync is
to be able to write data as fast as possible so all clients can have
access to them as soon as possible. Right now, Sync provides immediate
consistency, since all writes are done synchronously on a single
The PHP application was built with extensibility in mind: the way
Mozilla stores the data and authenticates users (a mix of LDAP and
MySQL) might not work if the code is used by someone else. That's why
the code was built using abstractions for the storage and the
authentication part, and the Python version took back this good idea.
Basically, you can write a new authentication or storage class, and
configure Sync to use it. See the documentation I am building on this:
http://sync.ziade.org/doc/storage.html (temporary location)
The web server that runs the Python application will stay Apache (with
[mod_wsgi]) since it has proven to work very well with the current
implementation. I might bench other servers in the future though, like
Gunicorn + nGninx or uWSGI + nGninx. We now have a nice
Grinder script that realistically mimics Sync users, so..
Doc and Code
I've started a documentation, the temporary location is at
http://sync.ziade.org/doc and you can grab the code we are building
You can already use the server with your Firefox / Firefox Home, but
this is still at development stage, so use at your own risks.
I would love to get some feedback on that work !