Recently I have been researching embedded java relational databases (RDBMS). There's a project I want to startup soon and for the persistent storage side of things a database with a SQL interface would make life a lot easier in many regards. So began my search for a robust, free, embeddable java rdbms. My requirements for the project are not unreasonable (at least in my opinion):
- Fast selects of large sets of data or large subsets of data, it's OK if inserts are significantly slower.
- Database stored in a single file or the ability to zip or tar archive into a single file for the user to interact with. This is so it simple for the user to make backups or send to another computer.
- Small footprint, the smaller the better as the footprint becomes larger it becomes more unlikely the application will be able to be effectively distributed via java webstart.
- Standard SQL interface and JDBC driver so application can easily connect to a remote RDBMS like mysql or oracle at some point in the future.
- Programmed in native Java to increase the chance it can run anywhere.
The following features would be nice but are not required:
- Ability to encrypt data with AES or some other standard encryption method.
- Builtin compression to reduce disk footprint.
- ACID compliant to ensure no dataloss.
- Non restrictive open source license that allows commerical distribution (this is just a hobby project right now but who knows what may happen in the future, it's always best to leave all doors open).
I found a handful of databases out there but for this article I wish to focus on just a few. There were 4 systems available that caught my attention: Apache Derby, Daffodil One$DB, HSQL, and SQLite.
Apache Derby is a subproject of the Apache DB project. It was originally developed by Informix before IBM bought them out in the first part of the century. It then became IBM Cloudscape and was donated in 2003 to the Apache Foundation as an open source project. Since then it Apache has maintained as an incubator project and just recently (August 2005) has graduated from the incubator program. Overall I liked derby it was easy to work with and boasted a robust feature set. I particularly liked that encryption was available natively and that all major SQL-92 and ANSI SQL-99 features were available including transactions and triggers. Among the things I did not like were the speed of the system, selects were tolerable however it was the worst performer for inserts of the databases I tested. It took over 30 seconds to do 10,000 simple inserts. Selects were reasonably fast. My other complaint is in the footprint of the jar file, it's slightly over 2MB which seems excessive compared to other systems. Finally, there was no obvious way to make single file databases. You could point the system to a zip or jar file if it was a read-only database you wanted but that doesn't work for using the system for an applications primary file format. It feels like derby is trying to be all things to all people which makes it not a great choice for anyone. Hopefully in the near future the developers will settle on a specialized track and do that one thing really well.
Daffodil One$DB was my favorite database of the 4 I tested. It is a derivative of the Daffodil DB which is a closed source solution. Recently Daffodil decided to donate the sources to the open source community and start a project up on sourceforge. One$DB is now released under the LGPL, which is a nice open source license for libraries. As of this writing it was the only open source database that I could find which was purely java and allowed the database to be stored in a single file. Most SQL-92 and 99 features are supported and in general the system was fast and responsive. On my machine selects were quick and it only took a few second to insert 10,000 simple inserts. The embedded jar is just under 200K which is nice but there is a Common jar also required that clocks in just over 3MB. I was not a huge fan of this but I think it may be possible to trim a lot of fat out of this and hit close to somewhere around 300K which would be a very nice footprint. There is no encryption available as of current; however, Daffodil has announced they will be adding this in the coming months along with several other useful features.
HSQL was the first database I tried. From everything I read this was supposed to be the fastest database but after playing with it the inserts were definately the fastest but selects were among the slowest not to mention high overhead for startup and shutdown. I wasn't real impressed with this system in general. All data is stored in memory unless you use a cached table which is really annoying and the data format for non cached tables is just a list of inserts in the .script file which is the reason for high overhead on startup because it must recreate the in memory database everytime the system startsup. I was also unimpressed with the file size limitations that were present. While I don't expect users to be storing even gigabytes of data in the application it's nice to have the possibility too. The footprint of HSQL was the smallest of all tested clocking in at around 100K for the most optimized version and around 600K for the generic install. HSQL did not appear to have any ability to use a single file for data storage. I saw some mention of this possibility in several forums however I was never able to find the documentation on how to do this and the links provided never worked so all I can assume is that this feature is not available. All in all the HSQL system appeared to have the most hype and deliver the least bang for the buck.
Last but not least is the venearable SQLite. SQLite is actually written in C not Java and try as I might there does not appear to be a java port available which is really too bad because this embedded database has a lot of really nice properties. It natively only uses one file for storage, is very fast for both select and insert, and is simple so it does not add a lot of bloat to an application. In short it's almost exactly what I wanted. There is a JDBC driver available which uses JNI to interface with the c libraries; however, in the end the lack of simple portability for this solution killed it. I really wanted something that was natively java so that I would have the best chance at compile once run anywhere. If anyone out there knows of a java port for the SQLite libraries I would be very interested. Perhaps, one day when I am bored and get the hankering to do something cool I'll port it over and write a JDBC driver for it. But until that day comes we must write it off as a no go.
For my project it looks like right now I will be investing more time into using One$DB because it provides the closest set of features to my requirements. I would still be interested if anybody out there can suggest alternatives though. I would also be intersested if anyone knows of an interface that makes the java app think it's writing to individual files and directories but in fact it's not, it's just going to a single file on disk, a virtual file system in short.