Thursday 23 June 2011

Some MySQL projects I think are cool - HandlerSocket Plugin

The HandlerSocket project is described in Yoshinori Matsunobu's blog entry under the title 'Using MySQL as a NoSQL - A story for exceeding 750,000 qps on a commodity server'. It's a great headline and has generated a lot of buzz. Quite a few early commentators were a little confused about what it was - a new NoSQL system using InnoDB? A cache? In memory only? Where does Memcached come in? Does it support the Memcached protocol? If not, why not? Why is it called HandlerSocket?

Inspirations from Memcache may include the focus on simplicity, performance and a simple human readable protocol. As Yoshinori says, Kazuho Oku has already implemented a MySQLD-embedded Memcached server, no need to do it again. What's more, the Memcache protocol offers key-value functionality, whereas implementing a new protocol allows more functionality to be exposed.

The choice of name has come in for some flak. I believe the etymology is that HandlerSocket exposes the existing MySQL Handler interface directly over a separate socket. Perhaps a more exciting name will appear at some point, but looking at the MySQL Handler documentation gives a good background on the basis of the HandlerSocket Api.

HandlerSocket implements more than a Key-Value Api. It supports indexed data access in general including :
  • Equality search on any index prefix (returning 0, 1 or more rows)
  • Inequality search on any index prefix (returning 0, 1 or more rows)
This allows far more general use than a simple key-value API. Composite keys can be used and Secondary indexes searched and maintained. This exposes much more of the value of a storage engine like InnoDB.

SQL and non-SQL access to the same data


Yoshinori mentions inspiration from NdbApi for HandlerSocket - they wanted to get fast indexed access performance without excluding the possibility of performing ad-hoc SQL for reports etc. This has been one of the unique benefits of Ndb for some time - the ability to operate on the same underlying data via multiple Apis. Extending this to other MySQL engines (especially InnoDB) is a great idea. What is surprising is the difference a different access layer implementation can make to throughput. Who would have thought that so much performance could be consumed by parsing etc? I suspect that HandlerSocket may create a new benchmark for the MySQL team to optimise parsing towards in future.

As a MySQL daemon plugin, the HandlerSocket plugin gets to create threads running within a MySQLD server instance. These threads can then listen on network sockets and use the Storage Engine Api inside the server to perform primitive operations on Storage engines. From the point of view of the Storage engine, they are just client request handling threads in the Server accessing data concurrently with 'normal' SQL client threads.

Using the Storage Engine Api, HandlerSocket gets :
  • Storage engine independence
  • Concurrency control as implemented by the engine
  • Index maintenance as implemented by the engine
  • Constraint enforcement as implemented by the engine
  • Engine features such as online backup, crash recovery, compression, encryption etc.
Not going via the 'SQL layer' means that HandlerSocket misses out on :
  • SQL functionality (queries, joins, aggregation, UDFs etc.)
  • Stored procedures
  • Trigger activation
  • Query cache
  • ACL checks
  • Views
As with NdbApi, this is often a very good trade-off as many applications don't need these features for their heavy-lifting. However they can be very useful for less frequent reporting and administration tasks. Supporting consistent access via SQL or some simpler Api to the same data can avoid the need to split caching from an OLTP database and also potentially the need to split an OLTP and analytic database. Ndb gives more of a 'firewall' between different usage types by physically separating the storage from more complex query processing, but MySQLD could be extended to have more 'workload management' features internally if this were a problem for HandlerSocket.

The impressive published benchmarks use data which fits entirely in memory buffers, so that InnoDb need only write logs and checkpoints. HandlerSocket does not require that all data is in-cache, but the best performance will be achieved if this is the case.

Operation batching/pipelining

As the protocol description states, requests can be pipelined. There is no need for individual clients to make blocking synchronous DB requests. This can vastly increase throughput without excessively straining connection resources.

Often SQL DB access protocols do not make much use of the potential for batching requests on a single connection. JDBC supports batched updates and inserts, and appears to have support (not often talked about) for batched queries, but not every driver implements these and they're often not implemented in the same way. Often the SQL approach seems to be 'If you want different sets of data in one round trip, you need to find a way to get them all in one SQL statement'. This creates a false tension between reducing client-server round trips and avoiding complex joins and unnecessary unions. Decoupling query boundaries from thread blocking / flow of control changes removes this artificial tension and can simplify applications. A recent 'Facebook at Scale' talk describes how they use the MySQL Client's CLIENT_MULTI_STATEMENTS flag to decouple query boundaries and request latency. This pattern is one of the keys to implementing efficient NdbApi clients as well.

Commit grouping

Another HandlerSocket feature possibly inspired by Ndb is the commit grouping. Multiple user writes (it's not clear if/how transaction boundaries can be specified), are combined and committed to the engine together. This amortizes the commit cost across multiple operations. Where the engine performs expensive durability operations (e.g fsync) this can improve write throughput. Writes are group-committed to the Binlog as well, again in a similar way to Ndb. Another shared advantage of merging multiple client 'transactions' into fewer Binlog 'transactions' is that the Slave also gets to benefit from fewer, larger transactions for a given data change rate.

Impact

It's not clear yet what the impact of HandlerSocket will be. It enters a crowded market of SQL, NoSQL, NewSQL, DataGrid and other technologies. Competitors use JSON over HTTP, or Memcached protocols, support richer or simpler Apis, offer transparent sharding, eventually consistent replication or web-scale Erlang distributed Map Reduce. Perhaps HandlerSocket is too old-school?

I think it's a great technology, deserving of success. Coupled with an external sharding layer, it seems to offer a great way to improve the efficiency of MySQL scale out, without losing the ability to perform ad-hoc SQL etc.
Time will tell.