ElasticSearch Interview Questions And Answers
What is ElasticSearch?
Elasticsearch is a search engine based on Lucene. It has a distributed, multitenant-able full-text search engine. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.
What is the use of attributes- enabled, index and store?
The enabled attribute is applicable to several ElasticSearch created fields like _index and _size.
Store implies the data stored by Lucene, which will again return when necessary. Stored fields are not searchable.
The index is employed for searching. Indexed fields are transformed during analysis, and cannot retrieve the original data when necessary.
What is an Analyzer in ElasticSearch?
While indexing data, it is transformed internally via the defined Analyzer for the index.
Analyzers are made of one Tokenizer, preceded by CharFilters and zero or many TokenFilters. On the other hand, analysis module refers Analyzers under the name of mapping definitions or any APIs.
Elasticsearch is prebuilt with analyzers that are ready to use. However, you can integrate the built in character, token filters, along with tokenizers to create custom analyzers.
What is Character Filter in Elasticsearch Analyzer?
A character filter obtains the ideal text as stream of characters, later on modifies it by adding, deleting, or altering characters. For example, any character filter in usage has the ability to convert Hindu-Arabic numerals (٠١٢٣٤٥٦٧٨٩) into Arabic-Latin numerals (0123456789), and even sometimes strip HTML elements via the stream.
What is Token filters in Elasticsearch Analyzer?
A token filter obtains the token stream, later on add, delete, or alter the tokens. For instance, a lowercase token filter modifies all tokens into lowercase, a stop token filter deletes stop words, and a synonym token filter includes synonyms into the token stream.
Token filters will be unable to change the position or character offsets of any certain token.
What is a Tokenizer?
Tokenizers break down a string into stream of tokens. A single tokenizer split the string into terms when working with punctuation and whitespace. Elasticsearch has a number of built in tokenizers which can be used to build custom analyzers.
What is a Filter?
After Tokenizer ends the process of data, the same is carried by Filter.
Certain types of Filters available in ElasticSearch 1.10, are.
- AND FILTER
- EXISTS FILTER
- GEO DISTANCE FILTER
- GEO POLYGON FILTER
- GEOHASH CELL FILTER
- HAS PARENT FILTER
- INDICES FILTER
- MATCH ALL FILTER
- NESTED FILTER
- OR FILTER
- QUERY FILTER
- REGEXP FILTER
- TERM FILTER
- TYPE FILTER
What are the advantages of Elasticsearch?
- Elasticsearch is compatible on any platform.
- Elasticsearch is Near Real Time (NRT), making it searchable on engine.
- Elasticsearch cluster is distributed, scalable and easy to integrate.
- Elasticsearch REST uses JSON objects, making it to invoke the Elasticsearch server along with different programming languages.
- Elasticsearch supports every document type except text rendering.
What is Elasticsearch REST API and use of it?
Elasticsearch provides a very comprehensive and powerful REST API that you can use to interact with your cluster. Among the few things that can be done with the API are as follows:
- Check your cluster, node, and index health, status, and statistics
- Administer your cluster, node, and index data and metadata
- Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes
- Execute advanced search operations viz. aggregations, filtering, paging, scripting, sorting, among many others
What are the Disadvantages of Elasticsearch?
- Elasticsearch does not support multiple languages while handling request and response data in JSON.
- In rare cases, it has a problem of Split Brain situations.
Does ElasticSearch have a schema?
Yes, Elasticsearch can have a schema. A schema is a description of one or more fields that describes the document type and how to handle the different fields of a document. The schema in Elasticsearch is a mapping that emphasizes the JSON document fields and other data type, as well as Lucene indexes under the hood. Because of this, in Elasticsearch terms, we usually call this schema a “mapping”.
What is a cluster in ElasticSearch?
Cluster is a collection of nodes that holds data together and enables indexing and search abilities across each. Each cluster is recognized by a unique default name i.e. “elasticsearch”. This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.
What is a node in ElasticSearch?
Node is a minute server and forms a part of the cluster. It stores the data and enjoys the clusters indexing and search functionalities.
What is Ingest Node in Elasticsearch?
Ingest nodes can execute pre-processing an ingest pipeline. It effectively transform and works on the document prior to indexing. Dedicated ingest nodes mark the master and data nodes either as false or true.
What is Elasticsearch Data Node?
Data nodes hold shards that handle indexed documents. They execute data related CRUD and search aggregation operations etc. Set node.data=true to make node as Data Node.
Data Node operations are I/O-, memory-, and CPU-intensive. Data nodes benefit the separation of the master and data roles.
What is Master Node and Master Eligible Node in Elasticsearch?
Master Node control cluster-wide operations like to create or remove an index, track nodes of the cluster, and decide to allocate shards on nodes. It is important for cluster health to have a stable master node. Master Node elected based on configuration properties node.master=true (Default).
Master Eligible Node decide based on below configuration
discovery.zen.minimum_master_node : number (default 1)
and above number decide based (master_eligible_nodes / 2) + 1
What is Tribe Node and Coordinating Node in Elasticsearch?
Tribe node connect variant clusters and execute search operations across each connected clusters. This node is configured by settings tribe.
Coordinating Node is just like a Smart Load balancer that handle master duties, to hold data, and pre-process documents, then you are left with a coordinating node that can only route requests, handle the search reduce phase, and distribute bulk indexing.
Every node can be termed as a coordinating node which has all three node.data, node.ingest and node.master, set to false. This node is impossible to disable as it possess enough memory and CPU to deal with the gather phase.
What is an index in ElasticSearch?
Index is a ‘database’ within relational database. Its mapping defines multiple types and maps to one or many primary shards and can have zero or many replica shards.
MySQL => Databases
ElasticSearch => Indices
What is inverted index in Elasticsearch?
Inverted Index is backbone of Elasticsearch which make full-text search fast. Inverted index consists of a list of all unique words that occurs in documents and for each word, maintain a list of documents number and positions in which it appears.
For Example: There are two documents and having content as:
1: FacingIssuesOnIT is for ELK.
2: If ELK check FacingIssuesOnIT.
To make inverted index each document will split in words (also called as terms or token) and create below sorted index .
Now when we do some full-text search for String will sort documents based on existence and occurrence of matching counts.
Usually in Books we have inverted indexes on last pages. Based on the word we can thus find the page on which the word exists.
What is a shard?
Different applications need to employ multiple ElasticSearch instances on separate machines. Data in every index is divided into multiple partitions, each controlled by a separate ElasticSearch instance. Each such partition is termed as shard. By default, each ElasticSearch index possess 5 shards.
What is a replica?
Each shard has 2 copies called replicas. They are highly-available and fault-tolerant.
What is a document in ElasticSearch?
Document is similar to a row in relational databases. Each document in the index possess different structure, but has same data type for mutual fields.
MySQL => Databases => Tables => Columns/Rows
ElasticSearch => Indices => Types => Documents with Properties
What are the basic operations you can perform on a document?
The following operations can be performed on documents
- INDEXING A DOCUMENT USING ELASTICSEARCH.
- FETCHING DOCUMENTS USING ELASTICSEARCH.
- UPDATING DOCUMENTS USING ELASTICSEARCH.
- DELETING DOCUMENTS USING ELASTICSEARCH.
What is a type in ElasticSearch?
Type is a logical index partition whose semantics are entirely upon the user.
What are common area of use Elasticsearch?
It’s useful in application where need to do analysis, statics and need to find out anomalies on data based on pattern.
It’s useful where need to send alerts when particular condition matched like stock market, exception from logs etc.
It’s useful with application where log analysis and issue solution provide because of full search in billions of records in milliseconds.
It’s compatible with application like Filebeat, Logstash and Kibana for storage of high volume data for analysis and visualize in form of chart and dashboards.
Define Analyzer in ElasticSearch?
In ElasticSearch, Data is transformed while indexing internally by the analyzer specifically defined for the index and then indexed. Analyzers are built of filters and tokenizes. The major types of analyzers available in ElasticSearch 1.10 are as follows:
- simple analyzer
- standard analyzer
- keyword analyzer
- language analyzers
- snowball analyzer
- custom analyzer
- pattern analyzer
- whitespace analyzer
- stop analyzer
What is the query language of Elasticsearch?
Apache Lucene query language which is also called as Query DSL is used by Elasticsearch.