16 October, 2020
Ans: Apache Cassandra is an open-source, distributed, and decentralized/distributed storage system (database), for managing very large amounts of structured data spread out across the world. It provides a highly available service with no single point of failure. It was developed at Facebook for inbox search and it was open-sourced by Facebook in July 2008.
Ans: NoSQL database (sometimes called Not Only SQL) is a database that provides a mechanism to store and retrieve data other than the tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, eventually consistent, and can handle huge amounts of data.
Ans: When a new node joins a cluster, it will automatically contact the other nodes in the cluster and copy the right data to itself.
Ans: A seed node in Cassandra is a node that is contacted by other nodes when they first start up and join the cluster. A cluster can have multiple seed nodes. Seed node helps the process of bootstrapping for a new node joining a cluster. It is recommended using the 2 seed node per data center.
Ans: Node is the place where data is stored.
Ans: Datacenter is a collection of related nodes.
Ans: Cluster is a component that contains one or more data centers.
Ans: The commit log is a crash-recovery mechanism in Cassandra. Every write operation is written to the commit log.
Ans: Mem-table is a memory-resident data structure. After the commit log, the data will be written to the mem-table. Sometimes, for a single-column family, there will be multiple mem-tables.
Ans: SSTable is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value.
Ans: Bloom filter is nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a set. It is a special kind of cache. Bloom filters are accessed after every query.
Ans: User can access Cassandra through its nodes using Cassandra Query Language (CQL). CQL treats the database (Keyspace) as a container of tables. Programmers use cqlsh: a prompt to work with CQL or separate application language drivers.
Ans: Column family is a container for an ordered collection of rows. Each row, in turn, is an ordered collection of columns.
Ans: This method is used to close the current session instance.
Ans: This method is used to execute a query. It requires a statement object.
Ans: This command will provides the version of the cqlsh you are using.
Ans: Apache Cassandra is a second-generation distributed database originally open-sourced by Facebook. Its write-optimized shared-nothing architecture results in excellent performance and scalability. The Cassandra storage cluster and S3 archival layer are designed to expand horizontally to any arbitrary size with a linear cost. Cassandra’s memory footprint is more dependent on the number of column families than on the size of the data set. Cassandra scales pretty well horizontally for storage and IO, but not for memory footprint, which is tied to your schema and your cache settings regardless of the size of your cluster. some of the important links about Cassandra are available-here.
The syntax for creating keyspace in Cassandra is
CREATE KEYSPACE <identifier> WITH <properties>
Ans: In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consists of one keyspace per node.
Ans: cqlsh is a Python-based command-line client for Cassandra.
Ans: Yes, Cassandra works pretty well on windows. Right now we have Linux and Windows compatible versions available.
Ans: Consistency means to synchronize and how up-to-date a row of Cassandra data is on all of its replicas.
Ans: In this write, operations will be handled in the background, asynchronously. It is the fastest way to write data and the one that is used to offer the least confidence that operations will succeed.
Ans: Thrift is the name of the RPC client used to communicate with the Cassandra server.
Ans: Kundera is an object-relational mapping (ORM) implementation for Cassandra wrote using Java annotations.
Ans: JMX stands for Java Management Extension
Ans: Cassandra is a robust software. Nodes joining and leaving are automatically taken care of. With proper settings, Cassandra can be made failure resistant. That means that if some of the servers fail, the data loss will be zero. So, you can just deploy Cassandra over cheap commodity hardware or a cloud environment, where hardware or infrastructure failures may occur.
Inclined to build a profession as Cassandra Tutorials? Then here is the blog post on, explore Cassandra Tutorials
Ans: Being a part of the NoSQL family Cassandra offers solutions for a problem where your requirement is to have a very heavy write system and you want to have quite a responsive reporting system on top of that stored data. Consider the use case of Web analytic where log data is stored for each request and you want to build an analytical platform around it to count hits by the hour, by browser, by IP, etc in a real-time manner.
Ans: Cassandra is based on the NoSQL database and does not provide ACID and relational data property. If you have a strong requirement of ACID property (for example Financial data), Cassandra would not be a fit in that case. Obviously, you can make work out of it, however, you will end up writing lots of application code to handle ACID property and will loose on time to market badly. Also managing that kind of system with Cassandra would be complex and tedious for you.
Ans: Secondary indexes are indexes built over column values. In other words, let’s say you have a user table, which contains a user’s email. The primary index would be the user ID, so if you wanted to access a particular user’s email, you could look them up by their ID. However, to solve the inverse query given an email, fetch the user ID requires a secondary index.
Ans: You want to query on a column that isn't the primary key and isn't part of a composite key. The column you want to be querying on has few unique values (what I mean by this is, say you have a column Town, that is a good choice for secondary indexing because lots of people will form the same town, date of birth however will not be such a good choice).
Ans: Try not using secondary indexes on columns contain a high count of unique values and that will produce few results.
Q40) I have a row or key cache hit rate of 0.XX123456789 reported by JMX. Is that XX% or 0.XX%?
Ans: By default, Cassandra uses 7000 for cluster communication, 9160 for clients (Thrift), and 8080 for JMX. These are all editable in the configuration file or bin/cassandra.in.sh (for JVM options). All ports are TCP.
Ans: A high availability system is the one that is ready to serve any request at any time. High availability is usually achieved by adding redundancies. So, if one part fails, the other part of the system can serve the request. To a client, it seems as if everything worked fine.
TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills .