ElasticSearch Interview Questions And Answers
What is ElasticSearch?
Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.
What is the is use of attributes- enabled, index and store?
The enabled attribute applies to various ElasticSearch specific/created fields such as _index and _size. User-supplied fields do not have an “enabled” attribute.
Store means the data is stored by Lucene will return this data if asked. Stored fields are not necessarily searchable.
By default, fields are not stored, but full source is. Since you want the defaults (which makes sense), simply do not set the store attribute.
The index attribute is used for searching. Only indexed fields can be searched. The reason for the differentiation is that indexed fields are transformed during analysis, so you cannot retrieve the original data if it is required.
What is an Analyzer in ElasticSearch?
While indexing data in ElasticSearch, data is transformed internally by the Analyzer defined for the index.
Analyzers are composed of a single Tokenizer and zero or more TokenFilters. The tokenizer may be preceded by one or more CharFilters. The analysis module allows you to register
Analyzers under logical names which can then be referenced either in mapping definitions or in certain APIs.
Elasticsearch comes with a number of prebuilt analyzers which are ready to use. Alternatively, you can combine the built in character filters, tokenizers and token filters to create custom analyzers.
What is Character Filter in Elasticsearch Analyzer?
A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters. For instance, a character filter could be used to convert Hindu-Arabic numerals (٠١٢٣٤٥٦٧٨٩) into their Arabic-Latin equivalents (0123456789), or to strip HTML elements like from the stream.
An analyzer may have zero or more character filters, which are applied in order.
What is Token filters in Elasticsearch Analyzer?
A token filter receives the token stream and may add, remove, or change tokens. For example, a lowercase token filter converts all tokens to lowercase, a stop token filter removes common words (stop words) like the from the token stream, and a synonym token filter introduces synonyms into the token stream.
Token filters are not allowed to change the position or character offsets of each token.
An analyzer may have zero or more token filters, which are applied in order.
What is a Tokenizer in ElasticSearch?
Tokenizers are used to break a string down into a stream of terms or tokens. A simple tokenizer might split the string up into terms wherever it encounters whitespace or punctuation. Elasticsearch has a number of built in tokenizers which can be used to build custom analyzers.
What is a Filter in ElasticSearch?
After data is processed by Tokenizer, the same is processed by Filter, before indexing.
Following types of Filters are available in ElasticSearch 1.10.
GEO BOUNDING BOX FILTER
GEO DISTANCE FILTER
GEO DISTANCE RANGE FILTER
GEO POLYGON FILTER
GEOHASH CELL FILTER
HAS CHILD FILTER
HAS PARENT FILTER
MATCH ALL FILTER
What are the advantages of Elasticsearch?
- Elasticsearch is implemented on Java, which makes it compatible on almost every platform.
- Elasticsearch is Near Real Time (NRT), in other words after one second the added document is searchable in this engine.
- Elasticsearch cluster is distributed, which makes it easy to scale and integrate in any big organizations.
- Creating full backups of data are easy by using the concept of gateway, which is present in Elasticsearch.
- Elasticsearch REST uses JSON objects as responses, which makes it possible to invoke the Elasticsearch server with a large number of different programming languages.
- Elasticsearch supports almost every document type except those that do not support text rendering.
- Handling multi-tenancy is very easy in Elasticsearch when compared to Apache Solr.
What is Elasticsearch REST API and use of it?
Elasticsearch provides a very comprehensive and powerful REST API that you can use to interact with your cluster. Among the few things that can be done with the API are as follows:
Check your cluster, node, and index health, status, and statistics
Administer your cluster, node, and index data and metadata
Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
Execute advanced search operations such as paging, sorting, filtering, scripting, aggregations, and many others
What are the Disadvantages of Elasticsearch?
Elasticsearch does not have multi-language support in terms of handling request and response data in JSON while in Apache Solr, where it is possible in CSV, XML and JSON formats.
Elasticsearch have a problem of Split Brain situations, but in rare cases.
Does ElasticSearch have a schema?
Yes, Elasticsearch can have a schema. A schema is a description of one or more fields that describes the document type and how to handle the different fields of a document. The schema in Elasticsearch is a mapping that describes the the fields in the JSON documents along with their data type, as well as how they should be indexed in the Lucene indexes that lie under the hood. Because of this, in Elasticsearch terms, we usually call this schema a “mapping”.
Elasticsearch has the ability to be schema-less, which means that documents can be indexed without explicitly providing a schema. If you do not specify a mapping, Elasticsearch will by default generate one dynamically when detecting new fields in documents during indexing.
What is a cluster in ElasticSearch?
Cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is “elasticsearch”. This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.
What is a node in ElasticSearch?
Node is a single server that is part of the cluster. It stores the data and participates in the clusters indexing and search capabilities.
What is Ingest Node in Elasticsearch?
Ingest nodes can execute pre-processing an ingest pipeline to a document in order to transform and enrich the document before indexing. With a heavy ingest load, it makes sense to use dedicated ingest nodes and to mark the master and data nodes as false and node.ingest=true.
What is Data Node in Elasticsearch?
Data nodes hold the shards/replica that contain the documents that was indexed. Data Nodes perform data related operation such as CRUD, search aggregation etc. Set node.data=true (Default) to make node as Data Node.
Data Node operations are I/O-, memory-, and CPU-intensive. It is important to monitor these resources and to add more data nodes if they are overloaded.The main benefit of having dedicated data nodes is the separation of the master and data roles.
What is Master Node and Master Eligible Node in Elasticsearch?
Master Node control cluster wide operations like creating or deleting an index, tracking which nodes are part of the cluster, and deciding which shards to allocate to which nodes. It is important for cluster health to have a stable master node. Master Node elected based on configuration properties node.master=true (Default).
Master Eligible Node decide based on below configuration
discovery.zen.minimum_master_node : number (default 1)
and above number decide based (master_eligible_nodes / 2) + 1
What is Tribe Node and Coordinating Node in Elasticsearch?
Tribe node, is special type of node that coordinate to connect to multiple clusters and perform search and others operation across all connected clusters. Tribe Node configured by settings tribe.
Coordinating Node behave like Smart Load balancer which able to handle master duties, to hold data, and pre-process documents, then you are left with a coordinating node that can only route requests, handle the search reduce phase, and distribute bulk indexing.
Every node is implicitly a coordinating node. This means that a node that has all three node.master, node.data and node.ingest set to false will only act as a coordinating node, which cannot be disabled. As a result, such a node needs to have enough memory and CPU in order to deal with the gather phase.
What is an index in ElasticSearch?
Index is like a ‘database’ in a relational database. It has a mapping which defines multiple types. An index is a logical namespace which maps to one or more primary shards and can have zero or more replica shards.
MySQL => Databases
ElasticSearch => Indices
What is inverted index in Elasticsearch?
Inverted Index is backbone of Elasticsearch which make full-text search fast. Inverted index consists of a list of all unique words that occurs in documents and for each word, maintain a list of documents number and positions in which it appears.
For Example: There are two documents and having content as:
1: FacingIssuesOnIT is for ELK.
2: If ELK check FacingIssuesOnIT.
To make inverted index each document will split in words (also called as terms or token) and create below sorted index .
Now when we do some full-text search for String will sort documents based on existence and occurrence of matching counts.
Usually in Books we have inverted indexes on last pages. Based on the word we can thus find the page on which the word exists.
What is a shard in ElasticSearch?
Due to resource limitations like RAM, vCPU etc, for scale-out, applications need to employ multiple instances of ElasticSearch on separate machines. Data in an index can be divided into multiple partitions, each handled by a separate node (instance) of ElasticSearch. Each such partition is called a shard. By default an ElasticSearch index has 5 shards.
GetJob Ready with ElasticSearch Training With Live Project By Experts
What is a replica in ElasticSearch?
Each shard in ElasticSearch has 2 copy of the shard. These copies are called replicas. They serve the purpose of high-availability and fault-tolerance.
What is a document in ElasticSearch?
Document is similar to a row in relational databases. The difference is that each document in an index can have a different structure (fields), but should have same data type for common fields.
MySQL => Databases => Tables => Columns/Rows
ElasticSearch => Indices => Types => Documents with Properties
What are the basic operations you can perform on a document?
The following operations can be performed on documents
INDEXING A DOCUMENT USING ELASTICSEARCH.
FETCHING DOCUMENTS USING ELASTICSEARCH.
UPDATING DOCUMENTS USING ELASTICSEARCH.
DELETING DOCUMENTS USING ELASTICSEARCH.
What is a type in ElasticSearch?
Type is a logical category/partition of index whose semantics is completely upto the user.
What are common area of use Elasticsearch?
It’s useful in application where need to do analysis, statics and need to find out anomalies on data based on pattern.
It’s useful where need to send alerts when particular condition matched like stock market, exception from logs etc.
It’s useful with application where log analysis and issue solution provide because of full search in billions of records in milliseconds.
It’s compatible with application like Filebeat, Logstash and Kibana for storage of high volume data for analysis and visualize in form of chart and dashboards.