A few months ago, I had the great pleasure of meeting and discussing big data with Michael Stonebraker, a legendary computer scientist at MIT who specializes in database systems and is considered to be the forefather of big data. Stonebraker developed INGRES, which helped pioneer the use of relational databases, and has formed nine companies related to database technologies.
Until recently, the choice of a database architecture was largely a non-issue. Relational databases were the de-facto standard and the main choices were Oracle, SQL Server or an open source database like MySQL. But with the advent of big data, scalability and performance issues with relational databases became commonplace. For online processing, NoSQL databases have emerged as a solution to these problems. NoSQL is a catch-all for different kinds of database architectures — key-value stores, document databases, column family databases and graph databases. Each has it’s own relative advantages and disadvantages. However, in order to get scalability and performance, NoSQL databases give up “queryability” (i.e. not being able to use SQL) and ACID transactions.
More recently a new type of database has emerged that offers high performance and scalability without giving up SQL and ACID transactions. This class of database is called NewSQL, a term coined by Stonebraker. He provides an excellent overview of OldSQL vs NoSQL vs NewSQL in this video.
Some key points from the video:
- SQL is good.
- Traditional databases are slow not because SQL is slow. It’s because of their architecture and the fact that they are running code that is 30 years old.
- NewSQL provides performance and scalability while preserving SQL and ACID transactions by using a new architecture that drastically reduces overhead.
In the video, Stonebraker talks about VoltDB, an open source NewSQL database that comes from a company of the same name founded by him. Some of the performance figures of VoltDB are pretty amazing:
- 3 million transactions per second on a “couple hundred cores”
- 45x the performance of “a SQL vendor who’s name has more than three letters and less than nine”
- 5-6 times faster than Cassandra and same speed as Memcached on key-value operations
VoltDB sounds like an extremely compelling alternative to NoSQL databases, and certainly warrants a look if you want to move from a traditional “OldSQL” database to one that is highly scalable and performant without losing SQL and ACID.
There are NoSQL DBs that support ACID like RavenDB (http://ravendb.net/) which use alternatives to SQL like Lucene queries. VoltDB seems to give up on partition tolerance in exchange for availability and consistency (http://codahale.com/you-cant-sacrifice-partition-tolerance/)
Thanks for sharing. I hadn’t heard Michael Stonebraker being called the forefather of Big Data yet. This certainly didn’t come out of Gartner, I suspect? 🙂
Recently, I’ve seen him talk at the EPFL in Lausanne, Switzerland, about the traditional RDBMS wisdom being all wrong. Quite interesting. Yet, I do believe that Oracle and Microsoft (the old elephants) will catch up with him.
Anyway, I like the way he talks positively about SQL and ACID. The whole NoSQL vs. SQL discussion is kind of distracting from what is really desireable (SQL the language) and what is really a problem (some aspects of relational algebra, some aspects of ACID). On the other hand, few people really need something like VoltDB or Cassandra, or whatever.