08 October, 2020
Elasticsearch is a search engine based on Lucene. It has a distributed, multitenant-able full-text search engine. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.
The enabled attribute is applicable to several ElasticSearch created fields like _index and _size.
Store implies the data stored by Lucene, which will again return when necessary. Stored fields are not searchable.
The index is employed for searching. Indexed fields are transformed during analysis, and cannot retrieve the original data when necessary.
While indexing data, it is transformed internally via the defined Analyzer for the index.
Analyzers are made of one Tokenizer, preceded by CharFilters and zero or many TokenFilters. On the other hand, analysis module refers Analyzers under the name of mapping definitions or any APIs.
Elasticsearch is prebuilt with analyzers that are ready to use. However, you can integrate the built in character, token filters, along with tokenizers to create custom analyzers.
A character filter obtains the ideal text as stream of characters, later on modifies it by adding, deleting, or altering characters. For example, any character filter in usage has the ability to convert Hindu-Arabic numerals (٠١٢٣٤٥٦٧٨٩) into Arabic-Latin numerals (0123456789), and even sometimes strip HTML elements via the stream.
A token filter obtains the token stream, later on add, delete, or alter the tokens. For instance, a lowercase token filter modifies all tokens into lowercase, a stop token filter deletes stop words, and a synonym token filter includes synonyms into the token stream.
Token filters will be unable to change the position or character offsets of any certain token.
Tokenizers break down a string into stream of tokens. A single tokenizer split the string into terms when working with punctuation and whitespace. Elasticsearch has a number of built in tokenizers which can be used to build custom analyzers.
After Tokenizer ends the process of data, the same is carried by Filter.
Certain types of Filters available in ElasticSearch 1.10, are.
Elasticsearch provides a very comprehensive and powerful REST API that you can use to interact with your cluster. Among the few things that can be done with the API are as follows:
Yes, Elasticsearch can have a schema. A schema is a description of one or more fields that describes the document type and how to handle the different fields of a document. The schema in Elasticsearch is a mapping that emphasizes the JSON document fields and other data type, as well as Lucene indexes under the hood. Because of this, in Elasticsearch terms, we usually call this schema a “mapping”. What is a cluster in ElasticSearch?
Cluster is a collection of nodes that holds data together and enables indexing and search abilities across each. Each cluster is recognized by a unique default name i.e. "elasticsearch". This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.
Node is a minute server and forms a part of the cluster. It stores the data and enjoys the clusters indexing and search functionalities.
Ingest nodes can execute pre-processing an ingest pipeline. It effectively transform and works on the document prior to indexing. Dedicated ingest nodes mark the master and data nodes either as false or true.
Data nodes hold shards that handle indexed documents. They execute data related CRUD and search aggregation operations etc. Set node.data=true to make node as Data Node.
Data Node operations are I/O-, memory-, and CPU-intensive. Data nodes benefit the separation of the master and data roles.
Master Node control cluster-wide operations like to create or remove an index, track nodes of the cluster, and decide to allocate shards on nodes. It is important for cluster health to have a stable master node. Master Node elected based on configuration properties node.master=true (Default).
Master Eligible Node decides based on below configuration
discovery.zen.minimum_master_node : number (default 1)
and above number decide based (master_eligible_nodes / 2) + 1
Tribe node-connect variant clusters and execute search operations across each connected clusters. This node is configured by settings tribe.
Coordinating Node is just like a Smart Load balancer that handles master duties, to hold data, and pre-process documents, then you are left with a coordinating node that can only route requests, handle the search reduce phase, and distribute bulk indexing.
Every node can be termed as a coordinating node which has all three node.data, node.ingest, and node. master, set to false. This node is impossible to disable as it possesses enough memory and CPU to deal with the gathering phase.
Index is a ‘database’ within relational database. Its mapping defines multiple types and maps to one or many primary shards and can have zero or many replica shards.
MySQL => Databases
ElasticSearch => Indices
Inverted Index is backbone of Elasticsearch which makes full-text search fast. Inverted index consists of a list of all unique words that occurs in documents and for each word, maintain a list of documents number and positions in which it appears.
For Example: There are two documents and having content as:
1: FacingIssuesOnIT is for ELK.
2: If ELK check FacingIssuesOnIT.
To make inverted index each document will split in words (also called as terms or token) and create below sorted index .
Now when we do some full-text search for String will sort documents based on existence and occurrence of matching counts.
Usually in Books we have inverted indexes on last pages. Based on the word we can thus find the page on which the word exists.
Different applications need to employ multiple ElasticSearch instances on separate machines. Data in every index is divided into multiple partitions, each controlled by a separate ElasticSearch instance. Each such partition is termed as shard. By default, each ElasticSearch index possess 5 shards.
Each shard has 2 copies called replicas. They are highly-available and fault-tolerant.
The document is similar to a row in relational databases. Each document in the index possess different structure, but has same data type for mutual fields.
MySQL => Databases => Tables => Columns/Rows
ElasticSearch => Indices => Types => Documents with Properties
The following operations can be performed on documents
Type is a logical index partition whose semantics are entirely upon the user.
It’s useful in application where need to do analysis, statics and need to find out anomalies on data based on pattern.
It’s useful where need to send alerts when particular condition matched like stock market, exception from logs etc.
It’s useful with application where log analysis and issue solution provide because of full search in billions of records in milliseconds.
It’s compatible with application like Filebeat, Logstash and Kibana for storage of high volume data for analysis and visualize in form of chart and dashboards.
In ElasticSearch, Data is transformed while indexing internally by the analyzer specifically defined for the index and then indexed. Analyzers are built of filters and tokenizes. The major types of analyzers available in ElasticSearch 1.10 are as follows:
Apache Lucene query language which is also called as Query DSL is used by Elasticsearch.
TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills .