MySQL-Python

Friday, November 16, 2012

Is MySQLdb hard to install?

The short answer is: No. The longer answer is: It depends. With the latest version, most of the time, you can run:

pip install MySQL-python

and you're done... if you have the prerequisites. That's where it gets sticky.

The dreaded prerequisites

Since MySQLdb uses a C module to link against MySQL's client library, you will need:

A C compiler and associated toolchain
Python development headers and libraries
MySQL development headers and libraries

Linux

MySQLdb was originally developed specifically for use on Linux, as it was the only platform I cared about at the time. Over time, several Linux distributions (Red Hat, Debian, Ubuntu, etc.) picked it up, created their own packages, and distributed them. If you are a Linux user, I recommend you use your distributions packages. But what if you can't, and you really do need to build your own? Install using pip. Each Linux distribution has packages for all the prerequisites, but the names of these packages vary. For Ubuntu (and probably Debian):

apt-get install build-essential python-dev libmysqlclient-dev

Other UNIX-ish

MySQLdb has been a FreeBSD port for 12 years. Use it. Hey, if you're using FreeBSD or another *BSD, or Solaris, you probably don't need my help to install it...

Almost UNIX: Mac OS X

Mac OS X is very POSIX-like inside, and it so MySQLdb's setup treats it as POSIX, but it can get complicated. The simple part is: You need Xcode, which is Apple's compiler toolchain.

The easy way

By far the easiest way to get MySQLdb on the Mac is to use one of the two major UNIX-ish package distributions: MacPorts or Homebrew. Of the two, I have only used MacPorts. Once upon a time, everything in MacPorts had to be built from sources, like a *BSD or Gentoo-style port system, but there are now binary packages for many things. Some people swear by Homebrew. Either one will make your life easier. MacPorts has a port for MySQLdb (py-mysql and variants). If Homebrew has a "formula", I can't find it, but a pip install should work once you have compatible versions of Python and MySQL.

The hard way

The hard part about the Mac is, there are three architectures you could be building for: PowerPC, Intel 32-bit, and Intel 64-bit. We can safely ignore Motorola since Apple hasn't used them in a very long time, and PowerPC can be mostly ignored, but you still see some support for it. For example, the Python Apple distributes in Snow Leopard is a fat binary that includes code for PPC, i386, and x86_64. You might think this makes things easier, but it does not. If you're on any sort of modern Mac, it's going to use the x86_64 support. That means the version of MySQL you use also has to be built for x86_64. You can't mix and match.

Most people who are going to use Xcode instead of MacPorts or Homebrew are probably going to download binary packages of MySQL from mysql.com. They have separate 32-bit and 64-bit packaging, i.e. no fat binaries. You almost certainly want 64-bit packages. Another option is to download their Connector/C package, and here you will find 32-bit and 64-bit Intel and PowerPC as well. You could also download the source and build it yourself.

Once you have MySQL on your Mac, make sure you can run mysql_config from a shell. If it's not on your executable PATH, you'll have to hack on site.cfg to tell setup.py where it is.

One issue I have seen recently was ultimately an architecture mismatch problem, and even once that was fixed, the loader path had to be fixed in the environment, which is not ideal.

Windows sucks

Even though Windows sucks, current versions of MySQLdb build on Windows without a lot of drama.

For the build toolchain, you must have Visual Studio 2008, and specifically you need vcsetup.exe for the C compiler. It has to be the 2008 version because that's the version the python.org Python packaging uses. (Of course, if you're building your own Python, you're on your own here.) Like with the Mac folks, you have 32-bit and 64-bit options. Choose the same one for Connector/C. If you are building a 32-bit package, it should work, period. 64-bit I'm not so sure about. But I've been building 32-bit MSI packages, and pip install should grab one of those if you're using a 32-bit Python, and that is really all you need.

Conclusion

Is it simple? Yes, for most people. Most of the hard work has done for you. So if it's so simple, why don't I have more binary packages? Mainly because, there are several different versions of both Python and MySQL that you might choose to use at any given time, depending on your needs, and you might still choose 32-bit or 64-bit architectures. It's just not practical for me to build all of them. Maybe with some kind of a build-bot. This could change in the future, but for now, there are plenty of other open source package distributors out there, and if at all possible, you should just use a pre-built package.

Friday, November 2, 2012

MySQLdb-1.2.4 release candidate 1

I've finally about finished up 1.2.4. Please give 1.2.4c1 a try. I plan to make the final release next week sometime.

1.2.4 will support Python 2.4, but it's not tested, and won't be supported in 1.3.0. Python 2.5, 2.6, 2.7 and PyPy are all tested and supported.

I still need to make some documentation updates. It turns out I can't easily make it work with Read The Docs since it has a C module, but I'm planning to have them online at packages.python.org.

Monday, September 24, 2012

A brief history of MySQLdb

It was recently pointed out to me what a confusing mess the source repositories were for MySQLdb, so I spent a good bit of time cleaning them up. To understand how things got into the state they were in, and where we are going, a brief history lesson required.

I first started working on MySQLdb in 1998. At the time, the best option for source control was CVS, and I had a local CVS repository for the project, and occasionally put up some tarballs. In 2001, I got the OK to make it an open source project, and development moved to SourceForge. Whatever you might think of SourceForge nowadays, it was a safe place to host your code, distribute files, and you got a couple nice extras like a bug tracker and forums. And SourceForge had CVS, of course, but nothing else at the time. Subversion existed in early form but was not yet available at SourceForge.

SourceForge was a bit slow to pick up on Subversion. Even though Subversion 1.0 was released in 2004, it looks like they didn't make it available until 2006. I do remember that both MySQL and Python both used SourceForge for CVS repositories, and both left SourceForge not too long before it became available, because they just couldn't stand to wait any longer. In any case, once Subversion because available, I switched to using it, because it sucked a lot less than CVS. But one side effect of this was the copying of CVS's branches and tags. I could rename branches and re-tag things, but the old CVS branches and tags were still in there for good.

In 2008, sometime after 1.2.2, I was ready to make a clean start on MySQLdb, and re-engineer most of it. By this time, Mercurial was available and I decided to port just the SVN trunk over to a new Hg repository. I really had hoped to stop using the SVN repository altogether, but it was still being used for ZMySQLDA, and it also turned out a 1.2.3 bugfix release was needed.

Work has been sporadic on MySQLdb-2.0 over the last years. Part of it not having a lot of time, and part of it was developer burnout. But there is stuff that needs to be done. In particular, Python 3 compatibility is needed now more than ever. I thought I could do this in 1.2.4, but I would have to sacrifice compatibility for Python < 2.7. So MySQLdb-1.2.4 will be a bugfix release and fully Python 2.7 compatible (and should be Python 2.8 compatible), and very soon thereafter there will be a 1.3.0 which will require Python 2.7 or newer and be compatible with Python 3.

One parallel development over the life of MySQLdb has been the life of MySQL itself. First they were sold to Sun, which I was not very happy about, but life went on. Then Sun was bought by Oracle, which I'm not happy about at all. Still, since then, you can still get MySQL for free, so it hasn't been huge impact on me. The Oracle acquisition led to two new MySQL forks: MariaDB, led by original MySQL creator Michael "Monty" Widenius; and Drizzle, led by MySQL guru Brian Aker. MariaDB is a straight-up fork of MySQL-5.1 and is supposed to be a binary replacement for MySQL, so the current MySQLdb should work with it just fine. Drizzle, on the other hand, has been heavily refactored and converted from C to C++, and has a different API. There is a Drizzle Python project (which borrows heavily from MySQLdb in places) already.

Anyway... My plan is to be able to support MySQL, MariaDB, and probably Drizzle, which is going to require having different driver backends. Not so much for MySQL and MariaDB, but definitely for Drizzle. One of the other drawbacks of MySQLdb-1.x is, although there's some support for using the embedded server library, it's hard to have both the embedded server and the regular client library in the same installation. Additionally, building the C module on Windows has historically been painful, and there have been requests for a pure Python connector. I've also considered doing a ctypes-based driver; at the time, ctypes was not ready for prime time, but it is in the Python core for quite a while now.

With all these various back-end drivers possible, and support no longer being strictly limited to MySQL, I decided it would be best to rename MySQLdb-2.0, and for better or worse, the name I came up with was... Moist. Why, oh why "moist", you ask? Well, MySQL has the dolphin, MariaDB has the seal, and Drizzle has.. .the drizzle... all things that are wet, or perhaps moist. I thought about "Soggy" but there was already a project by that name on GitHub.

So: The SVN repository on SourceForge lives on, but only for ZMySQLDA: It's been migrated to a git repository (MySQLdb1) that is synced between SourceForge and GitHub, and this is where MySQLdb-1.x will live. The Hg repository on SourceForge that was for MySQLdb-2.0 has now been cloned to become moist on GitHub. The Hg repository will probably be disappearing in the near future, once I'm convinced everything is OK with the moist repository. I've also added some READMEs to all the repositories with a map of where all the code is buried.

Any questions?

Thursday, September 20, 2012

MySQLdb book and other news

Did you know there is a book specifically addressing how to use MySQLdb? MySQL for Python was published by Packt Publishing in September 2010. If you're worried about two years being an eternity on the internet, don't: There hasn't been a new release of MySQLdb since the book was published.

Speaking of new releases, I built a Windows installer of MySQLdb-1.2.3 for Python-2.7 and using the MySQL Connector/C 6.0. This should be able to connect to any modern version of MySQL (4.0 through 5.5 and newer), and as I understand it, it should also work with MariaDB, but this is not yet tested.

Additionally, there is a 1.2.4 release coming out in the near future which will be a bug-fix release only. It will support Python-2.4 through 2.7 (and should be compatible with Python-2.8 when it arrives). There is some work towards Python-3 compatibility (specifically, 3.2.3), but unfortunately, I can't easily support Python < 2.7 and Python >= 3 at the same time. So after 1.2.4 is out, there will be a 1.3.0 release shortly thereafter which will be for Python-2.7 and 3.2 or newer (and probably earlier).

Longer term: I've also been working on a MySQLdb-2.0 version which is almost completely rewritten from scratch on the Python side, and modularized (and some code removed) on the C side. It's going to have pluggable drivers, with the intention of having the option of using libmysqlclient (the standard C library), libmysqld (the embedded server library), some kind of pure Python driver, and eventually a driver for Drizzle. MySQLdb-2.0 is actually going to be renamed Moist, and no, I don't yet have the code in github yet, but it will be what is the MySQLdb2 repository on SourceForge.

Is the project leaving SourceForge? It remains to be seen. Every time I get fed up and I'm sure I'm about to move, they have some big update which makes things better. The current location is pretty well-known, and the project has been there since 2001. I do think I will be mirroring a git repository between SF and github at the very least; all the cool kids are on github these days.

Monday, February 22, 2010

Asynchronous programming and MySQLdb

It was asked in the previous post whether or not there would be better async support in MySQLdb-2.0. The answer is a qualified yes.

libmysqlclient (the MySQL C client library) has blocking calls, and doesn't have a true async interface. In a nutshell, the three blocking calls that are most important are mysql_query(), mysql_store_result() or mysql_use_result(), and mysql_fetch_row() (sometimes).

The original design of MySQLdb uses mysql_store_result(), which stores the entire result set in a MYSQL_RESULT structure; in this case, mysql_fetch_row() does not block. To save memory, the result is immediately converted to Python types and then freed.

An alternative cursor implementation (SSCursor) uses mysql_use_result(), which does not internally buffer the result set; this does cause mysql_fetch_row() to block, however. A further complication is that no new queries can be issued on the connection until the entire result set has been fetched. This is the primary reason why mysql_store_result() is used by default, because overall it causes less problems.

The main issue with using mysql_store_result() it can consume a lot of memory the result set is large. There's still the option of using a LIMIT clause, but it's inconvenient. But mysql_use_result() has it's own problems, in that you have to be careful to cycle through the entire result set before issuing another query. Otherwise you will get the "commands out of sequence" error.

I am pretty sure that inside the C client library, the wire protocol is exactly the same, and mysql_store_result() is just pre-buffering everything.

So in the end, anything using the C API is going to have to deal with some blocking calls. However, there are some design changes that will make some of these limitations a bit easier to deal with.

First, all the various Cursor classes are going away, and there will be only one True Cursor.

By default, cursors will still use mysql_store_result() on most queries, because there is a lot that it works very well for. This includes INSERT, UPDATE, and DELETE, which do not return any rows, but also some meta-queries such as SHOW WARNINGS, SHOW TABLES, etc. which do return rows but always a relatively small number, and don't take a long time to execute.

SELECT statements, on the other hand, will be detected, and those queries will use mysql_use_result() instead so that rows are not buffered in the C client. They will only be fetched upon demand.

The irony of using mysql_use_result() on SELECTs is, you can't scroll (mysql_data_seek()) on the result set, and this is the primary use case for scrolling. However, I still expect to make scroll work, possibly in a limited way, because these cursors will still be somewhat buffered. There will be a user-configurable maximum row limit that will be buffered, and once that buffer is filled, the oldest rows will be discarded. This limit will probably be 1 row by default.

How will the "commands out of sequence" error be avoided? If another cursor is created, the first cursor will be flushed, i.e. any remaining rows in the result set, and any additional pending result sets, will be read so that the query can be issued. There is also a clear method for the cursor, which also reads all the rows, but instead of buffering them, it discards them. It also avoids doing any Python type conversion.

So if you've gotten this far, you're probably wondering: How does any of this affect asynchronous use? Well... it doesn't. The C client library just isn't designed for that sort of thing. But there is still hope.

I've wanted a ctypes implementation that would still use the libmysqlclient library but wouldn't require compiler tools to build (and hopefully shut up Windows users). At the time, ctypes was still very early, and had not made it's way into the Python standard library yet (included as of 2.5). Fortunately, Jason Coombs has a patch against MySQLdb (jaraco.mysql) which implements this. The reason I haven't integrate this patch it would replace the original driver, instead of being an option. MySQLdb was never designed for multiple drivers. This has made it pretty close to the top of my TODO list for MySQLdb, because there's another case where this would be useful: There currently is no way to have _mysql (the current C driver module) built against libmysqlclient and libmysqld (the embedded server library, which uses the same API) at the same time without using virtualenv or something like that.

OK, so now the async people are asking: So how does this help async programming again? Well... it still doesn't. To truly have a non-blocking async driver probably means there will need to be a new implementation that is designed with that in mind. A pure Python implementation could do this. I have a start on one which I got from Monty Taylor of MySQL some time ago. It's not asynchronous either, but could be made so. At least it looks like a good starting point.

But here's the thing: As far as I know, there is no standard database API for Python which supports asynchronous operation. And it seems like there should be one. Maybe it's time for PEP-249 to be extended for an asynchronous API. Otherwise every database implementor is going to end up doing their own thing, and it sounds like there is a need for this sort of thing.

New releases coming soon

Kyle Vanderbeek is going to take over as release manager for MySQLdb-1.2. We should have one more release candidate of 1.2.3 first, followed quickly by the final release.

Development on MySQLdb-2.0 has been progressing, and has recently moved to a Mercurial repository on SourceForge. This was imported from the SVN trunk. If you pull from the SVN trunk in the future, you may be disappointed.

2.0 is turning into a very major rewrite. I should have an alpha release soon. For now, the hg repository builds and passes all tests, but there are probably a few things that aren't thoroughly tested yet, particularly scrolling on cursors. I'll post more detail along with the alpha release.

Python3 support is not immediately in the works, and I probably won't work on it until I am close to a beta. At this point, I would target Python-3.1, maybe 3.2. 3.0 would probably work too.

Thursday, March 19, 2009

MySQL-python-1.2.3 beta 2 released

I released the second beta of MySQLdb-1.2.3 over the weekend. So far I've gotten a fair number of downloads but not a lot of feedback. I did find out though what small tweaking is required to build on Windows. It's also in the Python Package Index, so if you can also install using easy_install MySQL-python. Once I make the final release of 1.2.3, I'll put up more eggs for fringe operating systems (Mac OS X, Windows).