NoSQL Databases is a relatively new genre of databases. NoSQL does not imply noSQL at all. In fact, most NoSQL databases support an SQL-like query language. NoSQL is for Not Only SQL. NoSQL databases differ from relational databases (RDBMS) in that they are based on a flexible schema (or schema-free) data model. A relational database such as Oracle database or MySQL database has a fixed table structure with a fixed number of columns and pre-specified column types. Data added to an RDBMS table must conform to the table definition. In contrast, a NoSQL data store could store data of variable structure. One row or document could have a different data structure from another. Different NoSQL data stores support different data models. The main data models are Document store, Key-value store, and Wide column store. Document stores are document-oriented database systems, most of which are based on the JSON document model. Key-value stores are based on a data model in which data is stored as key/value pairs. Wide column stores are somewhat similar to the table structure of a relational database in that data is stored in rows and columns, but the columns (column names and column types) are not fixed. We shall compare three commonly used NoSQL databases: MongoDB, Apache Cassandra, and Couchbase.
NoSQL provides the new data management technologies designed to meet the increasing volume, velocity, and variety of data. It can store and retrieve data that is modeled in means other than the tabular relations used in relational databases. NoSQL systems are also called “Not Only SQL” to emphasize that they may also support SQL-like query languages.
The Relational Databases have the following challenges:
- Not good for large volume (Petabytes) of data with a variety of data types (eg. images, videos, text)
- Cannot scale for large data volume
- Cannot scale-up, limited by memory and CPU capabilities
- Cannot scale-out, limited by cache dependent Read and Write operations
- Sharding (break database into pieces and store in different nodes) causes operational problems (e.g. managing a shared failure)
- Complex RDBMS model
- Consistency limits the scalability in RDBMS
Inclined to build a profession as Cassandra Training? Then here is the blog post on, explore Cassandra Training
Compared to relational databases, NoSQL databases are more scalable and provide superior performance. NoSQL databases address the challenges that the relational model does not by providing the following solution:
- A scale-out, shared-nothing architecture, capable of running on a large number of nodes
- A non-locking concurrency control mechanism so that real-time reads will not conflict writes
- Scalable replication and distribution – thousands of machines with distributed data
- An architecture providing higher performance per node than RDBMS
- Schema-less data model
Just as a table is stored in a database in an RDBMS, NoSQL data stores provide a top-level namespace or container for storing data. In MongoDB, the equivalent of a relational database table is a "collection," which contains one or more documents. A MongoDB "database," which is the top-level container, consists of one or more collections. In Couchbase, the equivalent of an RDBMS database is called a "bucket." A bucket could have one or more documents. In Apache Cassandra, a "keyspace" defines a top-level namespace for tables.
The equivalent of an RDBMS table is a MongoDB collection and the equivalent of an RDBMS table row is a MongoDB document. MongoDB is based on the document store data model in which a document is stored as BSON format. BSON format is binary JSON format. A MongoDB document consists of fields and values and each document could have different or the same fields as another. For example, the following could be a MongoDB document stored in a collection.
journal: 'Oracle Magazine',
publisher: 'Oracle Publishing',
edition: 'January February 2010,
A different document in the same collection could have different fields. For example, the following could be another document stored in the same collection as the first.
journal: 'Oracle Magazine',
edition: 'January February 2010,
section: 'Oracle JDeveloper',
title: 'Installing JDeveloper'
- Although a document in the same collection could have completely dissimilar fields, similar documents are usually grouped together. A MongoDB document field value could be any of the BSON data types such as Double, String, Object, Array, and Binary data.
- The Couchbase data model is based on the JSON document store. Couchbase data is stored as JSON documents in data buckets. As for MongoDB's BSON format, Couchbase does not have a fixed schema. One JSON document could have different fields from another. Couchbase document data types are the JSON data types such as strings, boolean, and arrays. What makes the JSON and BSON data models flexible is the nested hierarchical structures, including nested arrays and objects supported by JSON.
- Apache Cassandra's data model is a wide column model in which columns are grouped into a column family. Cassandra is not totally schema-free in that metadata for columns in a column family could be pre-specified. Two types of column families are feasible: static column family and dynamic column family. In a static column family, the column metadata, column names, and types are specified when a column family is created. In a dynamic column family, the column metadata is not pre-specified and an application may define any columns. In a static or dynamic column family, each row could have different columns. The only schema requirement is that each row has a row key, which is the equivalent of a primary key in an RDBMS table, and its type. In fact, a Cassandra table could consist only of row keys and no columns in any row. An Apache Cassandra column family may also be called a table; both CREATE TABLE and CREATE COLUMN FAMILY commands are available. The following is a comparison of an RDBMS table and an Apache Cassandra table (or column family).
NoSQL platform includes:
Workload diversity – Big Data comes in all shapes, colors, and sizes. Rigid schemas have no place here; instead, you need a more flexible design. You want your technology to fit your data, not the other way around. And you want to be able to do more with all of that data – perform transactions in real-time, run analytics just as fast, and find anything you want in an instant from oceans of data, no matter what form that data may take.
Scalability – With big data you want to be able to scale very rapidly and elastically, whenever and wherever you want. This applies to all situations, whether scaling across multiple data centers and even to the cloud if needed.
Performance – As has already been discussed, in an online world where nanosecond delays can cost you sales, Big Data must move at extremely high velocities no matter how much you scale or what workloads your database must perform. The performance of your environment, namely your applications, should be high on the list of requirements for deploying a NoSQL platform.
Continuous Availability – Building off of the performance consideration, when you rely on big data to feed your essential, revenue-generating 24/7 business applications, even high availability is not high enough. Your data can never go down, therefore there should be no single point of failure in your NoSQL environment, thus ensuring applications are always available.
Manageability – Operational complexity of a NoSQL platform should be kept at a minimum. Make sure that the administration and development required to both maintain and maximize the benefits of moving to a NoSQL environment are achievable.
Cost – This is certainly a glaring reason for making the move to a NoSQL platform as meeting even one of the considerations presented here with relational database technology can cost become prohibitively expensive. Deploying NoSQL properly allows for all of the benefits above while also lowering operational costs.
Strong Community – This is perhaps one of the more important factors to keep in mind as you move to a NoSQL platform. Make sure there is a solid and capable community around the technology, as this will provide an invaluable resource for the individuals and teams that will be managing the environment. Involvement on the part of the vendor should not only include strong support and technical resource availability, but also consistent outreach to the user base. Good local user groups and meetups will provide many opportunities for communicating with other individuals and teams that will provide great insight into how to work best with the platform of choice.
For in-depth knowledge on Cassandra, click on below
- How To Install Cassandra Ubuntu 14.04
- Cassandra Architecture
- CQL Datatypes in Cassandra
- Cassandra Data Types
- CQL Collections in Cassandra