Video Bar

Loading...

Thursday, March 19, 2009

MySQL-python-1.2.3 beta 2 released

I released the second beta of MySQLdb-1.2.3 over the weekend. So far I've gotten a fair number of downloads but not a lot of feedback. I did find out though what small tweaking is required to build on Windows. It's also in the Python Package Index, so if you can also install using easy_install MySQL-python. Once I make the final release of 1.2.3, I'll put up more eggs for fringe operating systems (Mac OS X, Windows).

Saturday, February 21, 2009

Sprinting at PyCon 2009

I've got a sprint scheduled now for PyCon 2009. I can only be there for the first day of sprints. If there's enough interest, we can probably find a way to sprint earlier during some of the open space session; or it can continue after I'm gone.

PyCon 2010 will be in Atlanta, GA, which is a lot closer to home, but not close enough that I can avoid lodging expenses.

Sunday, February 1, 2009

Project Status: Community Participation Needed

John Eikenberry has taken over work on ZMySQLDA for some time now and has released ZMySQLDA 3.1. I don't have any ongoing Zope deployment except some Zenoss, and use MySQLdb directly, so I'm not a good test candidate. Superficially, there don't seem to be any outstanding issues.

Since Python 2.6 and 3.0 were released, there has been a lot of demand for prepackaged MySQLdb for those versions. MySQLdb-1.2.2 seems to throw some warnings on Python 2.6, due to a new Python set type (and the old Set module being deprecated), and some old-style exception usage. These problems should be fixed in the SVN version (MySQLdb-1.2 branch). MySQLdb was originally developed for Python 1.5 so some old crufty stuff is still hiding out in there. There are also some build fixes for Mac and Windows.

MySQL-1.2.3 is not too far off. There are probably a couple more fixes that need to go into the 1.2 branch. This will almost certainly be the last release in the 1.2 series. One outstanding question is what versions of Python should be supported in 1.2.3? I've settled on no longer supporting Python 2.3, but I won't go out of my way to break it yet (i.e. decorators, generator expressions). Python 2.4 seems reasonable to support. My current development platform (Ubuntu) has Python 2.5.2 so that's what I'm going to test against. I also plan to test against Python 2.6.

Python 3.0 is a different beast. I'm not sure if it will be possible to have a single version of MySQLdb that works on both Python 2.x and 3.x yet. I'm assuming not. I don't think I'll have Python 3.x support until Python 3.1 at least, though that seems to be not too far off. Guido van Rossum has indicated that Python 3.x will have a faster release cycle than Python 2.x; see PEP-3000. We might even see some early 3.1 releases by PyCon 2009.

MySQLdb-1.3, the precursor to 2.0 and the SVN trunk, is on hold for right now. Some of the fixes that are in the 1.2 branch need to be backported. Currently it does not do any type conversion; this has been removed from the C module and will be done in Python. I don't expect this to adversely affect performance.  A lot more will be done using generators and more modern Python-isms. The question here is do I support Python 2.x or only Python 3.x. I'm inclined to say I'll develop it now for Python 3.x, and save backporting for later.

MySQL-5.1 was finally released. There are no big C API changes that I've seen (maybe no small ones either), so you should be able to build against it. For that matter, if you have MySQLdb built against the 5.0 client, I expect it to work fine with a 5.1 server, so save yourself some trouble and don't upgrade if you don't have to.

Similarly, if your OS vendor supplies packages for MySQLdb, use them. MySQLdb is known to be in a lot of Linux distributions, including Fedora, RHEL, Debian, Ubuntu, Gentoo, and others. It's also in the various *BSD distributions.

If you are running Windows or Mac OS X, I still don't use those platforms so I won't be building my own packages for them. If you want to contribute your own packages, I'll host them on SourceForge. Make sure you specify what version of Python and what version of MySQL it is built against.

I am planning a short sprint at PyCon 2009; it will not go any longer than Tuesday, unless there is a lot of unanticipated demand. In light of this, this is the approximate release schedule:

Late February 2009: MySQLdb-1.2.3 beta 1
Late March 2009: Sprint
Early April 2009: MySQLdb-1.2.3 release candidate 1
Mid-April 2009: MySQLdb-1.2.3 final

In order for a sprint to be effective, it can't be just me. I need people to help find, report, test, and fix bugs. I don't think we need more than two days to do this. I suspect we can get a lot done before the official sprint starts. We can probably make the trunk usable. If I could get one or two committed people at the sprint (not counting myself), I'd be happy.

Lastly, MySQLdb development is entirely funded by donations. Nobody pays me to work on it. I've been working on it for the past 10 years, 8 years on SourceForge.  I have to plan development around my work schedule to avoid any conflict of interest or intellectual property issues. For the same reason, I am attending PyCon out of my own pocket. Donations are appreciated, though I could also use some interested developers. One or two have come forward lately; the important thing is that you submit patches through the bug/patch tracker. If your patches are good quality, I'll probably give you write access to SVN.

Feedback is solicited and welcomed.


Wednesday, May 14, 2008

Zenoss Deathmatch

I've been working for the last week on implementing Zenoss to replace Nagios and Cacti. Individually Nagios and Cacti are pretty good at what they do, but they don't integrate well.

Nagios is primarily an availability monitor, so it's good for notifying you when something goes down, or a disk is filling up, or the load average is too high. etc., but it's not so great for monitoring performance. Nagios 1.4 uses text configuration files. There is a templating system which can be helpful if you have a lot of identical systems.

Cacti, on the other hand, is pretty good at monitoring performance, as in how much bandwidth are you using, resource utilization, and so on with nice long-term graphs using RRDtool, but it's not so great for notifying if something is down. Cacti is almost exclusively SNMP-based, and as a result, you can usually just point it at a device through the web interface and it will auto-discover everything interesting. If you have more than a few hundred items to measure, you need to use cactid, which is a very fast threaded poller written in C.

I've been using both for about 3-4 years separately, but because they don't integrate easily (even though both use MySQL as their backend storage), there's a lot of duplication of effort in getting both of them configured.

And then there's Zenoss. Zenoss does both availablity and performance monitoring, with long-term graphing using RRDtool, log analysis, and network-based auto-discovery. Zenoss is written in Python using the Zope-2 framework. Most of the device metadata is stored on ZODB, Zope's native object database. Long-term performance data is stored in RRDtool. Event logs are stored in MySQL.

Everything in Zenoss integrates together very well. The data is faceted in the sense that you can browse devices by location, by class, by group, or by system. It has a built-in syslog server, it can use WMI for monitoring Windows systems, it has very flexible event handling.

There are still some rough edges in 2.1.92, which is a beta for 2.2. First is, it's a bit of a memory hog and I'm inclined to believe there are some memory leaks. After a day or two the main process will start to use over 200 MB; restarting tends to knock it back down to to around 100 MB or so.

Syslog support has some issues. When I first started feeding it some syslog data, all the events were being classified as "/Unknown". This is normal. Once you have some log entires, you can then tell it to map that entry to an event. The problem was, the events had components (the process name when parsing syslog data), but they had no event classes set. Looking at the code, it seemed like it should have been setting the event class ID to whatever the component/process name was. It just wasn't. After some Googling, I found out the code to build the event class key was just plain broken. After making these suggested changes, I could start mapping events.

Another syslog problem was in parsing the hostname. I have a satellite syslog-ng server in a remote location that logs to my central syslog-ng server. Because of this, the hostname has the relay information in it. Zenoss' syslog support has an option to parse this though, so no problem, right? Despite turning this on, I was still getting entires like IP/IP, so back into the syslog code. It turns out, Zenoss expects the separator between the two hostnames to be "@", and syslog-ng uses "/'. Easy fix in the code, but I suspect this may work for the standard syslog, and it needs to be a configuration option.

Despite all of this, I like Zenoss a lot. I am running it parallel with Nagios until I get all the event handling nailed down. I might need 2 GB RAM on the monitoring server though, and I have already moved the MySQL database onto a different server.

Tuesday, April 8, 2008

CrunchyFrog: A database navigator and query tool for GNOME

CrunchyFrog is a database navigator and query tool for
GNOME.

Currently PostgreSQL, MySQL, Oracle, SQLite3 databases and LDAP servers are supported for browsing and querying.

I gave it quick try and it looks really promising. Gotta love a Monty Python reference in any case.

Monday, March 24, 2008

MySQL Cluster support for Django

Ivan Sagalaev has created a mysql_cluster database backend for Django, which allows you to configure master and slave servers, and then specify which should be used on given view with Python decorators. Found via del.icio.us and Simon Willison's blog.

Friday, March 14, 2008

I am not dead

It's been nearly a year since the last post, so you might naturally wonder if I am dead or stopping development of MySQLdb. Actually I've been sick for the last year and a half or so, and hadn't really been motivated enough to do anything. Here's what's been going on:

John Eikenberry has taken over development of ZMySQLDA, since I pretty much don't do anything with Zope these days. He's made a couple of releases on the way to a 3.0 release. As far as I know, ZMySQLDA is still only useful with Zope 2 as Zope 3 has a different architecture and comes with MySQL support directly.

Monty Taylor from MySQL AB has volunteered to help out on MySQLdb. I believe the way this is going to work out is he's going to be doing maintenance on the 1.2 branch, and get some minor bug fixes out there. In addition, he has a good start on a native (i.e. written in Python) MySQL driver.

I have a lot of work done towards the first 1.3 version, which will be the development branch (SVN trunk) for 2.0. To a large extent, this is a refactoring project, which means there are a lot of internal changes that don't affect most users.

When I first started working on MySQLdb (way back in the last millenium), there was only one option for talking to the MySQL server: Use the C API via libmysqlclient. Then the re-entrant/thread-safe libmysqlclient_r was added. Then the embedded server libmysqld. And now there is the prospect of a native Python version. This makes building more complicated because you currently have to build against one library at a time. It's also more complicated for users: Suppose you want to have both a regular client version and an embedded version on the same system?

1.3/2.0 is going to fix this by building all the possible options as separate drivers, and then you can specify which one to use at connect-time, or you can use the default, which will probably go in the order libmysqlclient_r, libmysqlclient, and native. (The embedded version requires special initialization so it should never be a default.)

1.2 and earlier uses a type conversion dictionary to map MySQL column types to functions which convert a string result to a Python type. Additionally, this same dictionary is used to convert Python types into SQL literals. I think in earlier (pre-1.0) versions, these were separate dictionaries, and they were later combined because they have a disjoint set of keys. I'm not sure now if this was a good idea or not.

One of the other complications of this approach is TEXT columns. To the MySQL C API, TEXT columns have the same column type as BLOB columns. The difference is the presence of a flag. This took some kludgy stuff to get to work.

Then unicode came along, not just in Python but in MySQL. (The original target versions where MySQL-3.23 and Python-1.5.) This complicated the type conversion because now it was dependent on the connection's character set, which could be changed, so the converter dictionary had to be tinkered with on the fly. Additionally, there were reference count problems (and maybe still are to an extent) with this approach, due in part to the dictionary being about to be overridden by the user.

I haven't decided entirely how this is going to be fixed, but I will have some method for users to override the type conversion at runtime. I will probably have some hooks that will allow you to use a specific conversion for column based on column type, column name, table name, database name, or any combination thereof. For example, you could have a rule that said that any column name ending with "_ip" with a column type of UNSIGNED INTEGER could be returned as a user-defined IP object, but stored in the database as a four-byte integer.

The type conversion from MySQL to Python currently takes place in the low level driver (_mysql). Since there are going to be multiple drivers, this is going to move up into the Python layer. I don't believe this will adversely affect performance. Looking up the right converter is only a dictionary lookup anyway, and only has to be done once per column per query. Once you have a list/tuple of converters for the result set, these can be applied quickly with a list/generator comprehension.

MySQLdb-1.2 and earlier have several cursor classes which are built with mixin classes. The mixins control things like whether the rows are returned as tuples or dictionaries, or whether a buffered or unbuffered result set is used (i.e. mysql_store_result() vs. mysql_use_result()). This is is pretty messy and is going away.

The format of the row will probably be controlled by a hook of some sort. I'm inclined to using unbuffered cursors, i.e. SSCursor or mysql_use_result(), by default. The tricky part is the entire result set must be fetched before another query can be issued, so if there are multiple cursors, there needs to be a mechanism so that only one can use the connection at a time. Rather than locking the connection, there will need to be a way for one cursor to tell the others that they need to buffer the rest of the result set.

Some of this is already done for the trunk, but needs to be committed. In particular, there is no type conversion at all, and the driver selection is not done yet, but I'll see if I have time to work on it more this week at PyCon 2008.