Why Percona Acquired Tokutek: by Peter Zaitsev

It is my pleasure to announce that Percona has acquired Tokutek and will take over development and support for TokuDB® and TokuMX™ as well as the revolutionary Fractal Tree® indexing technology that enables those products to deliver improved performance, reliability and compression for modern Big Data applications.

At Percona we have been working with the Tokutek team since 2009, helping to improve performance and scalability. The TokuDB storage engine has been available for Percona Server for about a year, so joining forces is quite a natural step for us.

Fractal Tree indexing technology—developed by years of data science research at MIT, Stony Brook University and Rutgers University—is the new generation data structure which, for many workloads, leapfrogs traditional B-tree technology which was invented in 1972 (over 40 years ago!). It is also often superior to LSM indexing, especially for mixed workloads.

But as we all know in software engineering, an idea alone is not enough. There are hundreds of databases which have data structures based on essentially the same B-Tree idea, but their performance and scalability differs dramatically. The Tokutek engineering team has spent more than 50 man years designing, implementing and polishing this technology, which resulted (in my opinion) in the only production–ready Open Source transactional alternative to the InnoDB storage engine in the MySQL space – TokuDB; and the only viable alternative distribution of MongoDB – TokuMX.

Designed for Modern World – TokuDB and TokuMX were designed keeping in mind modern database workloads, modern hardware and modern operating system properties which allowed for much more clean and scalable architecture, leading to great performance and scalability.

Compression at Speed – As part of it, compression was an early part of design, so a very high level of compression can be achieved with low performance overhead. In fact, chances are with fast compression you will get better performance with compression enabled.

Great Read/Write Balance – You find databases (or storage engines) are often classified into read optimized and write optimized, and even though you most likely heard about much better insert speed with Fractal Tree indexing, both for MySQL and MongoDB you may not know that this is achieved with Read performance being in the same ballpark or better for many workloads. The difference is just not so drastic.

Multiple Clustered Keys – This is a great feature, which together with compression and low cost index maintenance, allows TokuDB and TokuMX to reach much better performance for performance critical queries by clustering the data needed by such query together.

Messages – When we’re speaking about conventional data structure such as B-trees or Hash tables, it is essentially a way data is stored and operations are being performed in it. Fractal Tree indexing operates with a different paradigm which is focused around “Messages” being delivered towards the data to perform operations in questions. This allows it to do a lot of clever stuff, such as implement more complex operations with the same message, merge multiple messages together to optimize performance and use messages for internal purposes such as low overhead online optimization, table structure changes etc.

Low Overhead Maintenance – One of obvious uses of such Messages is Low Overhead Maintenance. The InnoDB storage engine allows you to add column “online,” which internally requires a full table rebuild, requiring a lot of time and resources for copy of the table. TokuDB however, can use “broadcast message” to add the column which will become available almost immediately and will gradually physically propagate when data is modified. It is quite a difference!

Smart No-Read Updates – Messages allow you to do smart complex updates without reading the data, dramatically improving performance. For example this is used to implement “Read Free Replication”

Optimized In Memory Data Structures – You may have heard a lot about in-memory databases, which are faster because they are using data structure optimized for properties on memory rather just caching the pages from disk, as, for example, MyISAM and InnoDB do. TokuDB and TokuMX offer you the best of both worlds by using memory optimized data structures for resident data and disk optimized data structures when data is pushed to disk.

Optimized IO – Whether you’re using legacy spinning media or Solid State Storage you will appreciate TokuDB having optimized IO – doing less and more sequential IO which helps spinning media performance, as well as dramatically reducing wear on flash, so you can improve longevity for your media or use lower cost storage.

Between the Tokutek engineering team and Percona we have a lot of ideas on how to take this technology even further, so it is the technology of choice for large portions of modern database workloads in the MySQL and MongoDB space. We are committed to working together to advance the limits of Open Source databases (relational or not)!

Interested to check out whether TokuDB or TokuMX is right for your application? Please contact us at tokutek@percona.com.

The post Why Percona Acquired Tokutek: by Peter Zaitsev appeared first on Tokutek.