- Cassandra incorporates a number of architectural best practices that affect performance. None are unique to Cassandra, but Cassandra is the only NoSQL system that incorporates all of them.
- Fully distributed: Every Cassandra machine handles a proportionate share of every activity in the system. There are no special cases like the HDFS name node, MongoDB mongos, or the MySQL Fabric process that require special treatment. And with every node the same, Cassandra is far simpler to install and operate, which has long-term implications for troubleshooting. Even when everything works perfectly, master/slave designs have a bottleneck at the master. Cassandra leverage’s masterless design to deliver lower latency as well as uninterrupted uptime.
- Log-structured storage engine: A log-structured engine that avoids overwrites to turn updates into sequential i/o is essential both on hard disks (HDD) and solid-state disks (SSD). On HDD, because the seek penalty is so high; on SSD, to avoid write amplification and disk failure. This is why you see MongoDB performance go through the floor as the dataset size exceeds RAM. Couchbase’s append-only b-trees avoid overwrites but requires several seeks when updating or inserting new documents and does not support durable writes without a large performance penalty.
- Locally-managed storage: HBase has an integrated, log-structured storage engine, but relies on HDFS for replication instead of managing storage locally. This means HBase is architecturally incapable of supporting Cassandra-style optimizations like putting the commit log on a separate disk, mixing SSD and HDD in a single cluster with appropriate data pinned to each, or incrementally pulling compacted stables into the page cache.
- Prepared statements: Five years ago, NoSQL systems were characterized by only allowing primary key lookups, and there was no query planning to speak of. Today, Cassandra and most other systems2 support indexes and increasingly complex queries. The Cassandra Query Language allows Cassandra to pre-parse and re-use query plans, reducing overhead. Others remain stuck with primitive JSON APIs or even raw Java Scanner objects. CQL also allows Cassandra to express more sophisticated operations like lightweight transactions with a minimal impact on clients, resulting in wide support across many languages. The closest alternative is Apache Phoenix, a Java-only SQL layer for HBase.
For in-depth knowledge on Cassandra, click on below
- Cassandra NoSQL
- How To Install Cassandra Ubuntu 14.04
- CQL Datatypes in Cassandra
- Cassandra Data Types
- CQL Collections in Cassandra