In this post I want to share some key takeaways from recent Apache Cassandra Summit in Santa Clara.
In general I was pleased to see that Cassandra is getting more and more acceptance in the industry. I believe now it can be considered as a mainstream technology and large scale deployment of Cassandra clusters (> than 75000 nodes) is a prove of that.
In this post I will not talk about particular talks I found interested (I hope to dedicate a separate post for that) but I rather want to write about interesting technologies mentioned during the summit:
- One of the most exiting technologies I learned about during the summit is ScylaDB – an open-source C++14 implementation of Cassandra. It uses the latest advancements in C++, OS kernels and hardware drivers to deliver very impressive performance. ScyllaDB developers claim that they can achieve 10x performance over Cassandra in some benchmarks and obviously no more issues with JVM performance and Garbage Collection. It is especially beneficial for large servers with lots of memory and CPU that is not used efficiently by regular Cassandra.I plan to write a dedicated post about ScyllaDB with some benchmarks in the nearest future. Link to github for those who is eager to check it.
- Almost all talks mention Cassandra with Apache Spark spark deployment. Looks like it is becoming a standard deployment now. Many groups are also adding Apache Kafka to this list.
- I was surprised to learn that DateTieredCompactionStrategy (DTCS) has many issues even though we encountered some of them our-self recently. More information about the issues and proposed solutions you can find in this Jira ticket.
- Stratio – full text search in Cassadra based on Apache Lucene.
- Stargate – another search based on Lucene. Though my understanding is that Stratio is a more favorable solution by now.
- Apache Zeppelin – not really related to Cassandra but very useful tool.
- Presto – distributed SQL query engine that can run on top of Cassandra or Hive.