Nowadays searching on the web server is widely increasing day by day and data is getting huge in the server. A relational database server having petabytes of data can take a lot of time for searching a record. So here the solution comes with Elasticsearch because searching is lightning fast with this technology.
In this article, I will explain what exactly this technology is, advantages and disadvantages of this technology. After that, I will explain how to setup Elasticsearch.
What is Elasticsearch?
Elasticsearch is an open source distributed full-text search database with RESTful search and an analytics engine. It internally stores data in the document in JSON format. Then, you query them for retrieval. Basically, this data is schema-less, using some default to index the data unless you provide mapping as per your needs. Elasticsearch uses Lucene Standard Analyzer to index automatic type guessing and for high precision. Elasticsearch is lightning fast and highly scalable and used by highly respected organizations like Wikipedia, Linkedin etc.
How Does Elasticsearch work?
When you search from a traditional SQL database which is schema-based. It searches in single node only and it takes time to execute a query because of its schema based nature. While data is distributed in Elasticsearch and it searches data horizontally in each node so it becomes highly scalable and super fast because searching is done by many nodes at the same time.
When we make a search operation in Elasticsearch then it uses memory or RAM of each node. Search becomes faster because many nodes are working together. Let me explain you some basic building blocks and synonyms that will help you understand distributed nature of ES.
1. Index: An index can be understood as a container of data. I will explain index with an example later on.
2. Shard: Shard can be understood as a container of indexes because all indexes are stored in shards. All the nodes connected with the Elasticsearch must have at least one shard. We can store an index in multiple shards on the same node.
3. Replicas: Replica is nothing but a copy or you can say duplicate of the shard. However, it does not make any sense to have a replica on the same node or machine. That is why Elasticsearch makes a replica of shards to each other nodes so that there will be no impact of data if one node gets a failure.
4. Role nodes: All the machines or nodes connected to Elasticsearch cluster can have different roles. there are three different roles the data node, the master node and the client node. The main node where default installation has done will have all three roles.
The job of role nodes
Each role node has a specific job. Let’s discuss each one of them:
- Data node: Data nodes basically contains all the data, like indexes and shards. The primary role of these nodes is to keep all the data only and it does not perform execution of any query requests.
- Client node: This node acts as an entry point for Elasticsearch queries. This node receives all the queries and send them to data nodes
- Master node: This node maintains the Elasticsearch cluster and maintains the state of the cluster.
Features of Elasticsearch
- Scalable up to petabytes: Elasticsearch is scalable up to petabytes of structured and unstructured data.
- Replacement of document store: Elasticsearch can be used as a replacement of document stores like MongoDB and RavenDB.
- Uses denormalization: Elastic search uses denormalization to improve the search performance.
- Popular search engine: Elasticsearch engine is one of the well-known endeavor web crawlers, that is as of now being utilized by numerous huge associations like The Guardian, Wikipedia, StackOverflow, GitHub etc.
- Open source: Elasticsearch is open source and available under the Apache license version 2.0.
Advantages of Elasticsearch
Platform independent: Elastic search is developed on Java, which makes it compatible on almost every platform. It can be used in any machine where java runtime environment is running.
Elasticsearch is real-time: Search in Elasticsearch engine is super-fast. Data is searchable as soon as added to the engine.
Elasticsearch is distributed: It consists of multiple software components that are on multiple computers but run as a single system makes it easy to scale and integrate into any big organization.
Creates backups: Creating full backups are easy by using the concept of the gateway, which is present in Elasticsearch creates a replica of each shard. This is useful for protecting against hardware failures. Replication of shards will prevent loss of data.
Multi-tenancy: Elasticsearch is capable of serving multiple customers within a single instance when compared to Apache Solr.
JSON based response: Elasticsearch uses JSON objects as responses, which makes it possible to invoke the ES server with a large number of different programming languages. Though most of the programming languages support JSON.
Disadvantages of Elasticsearch
Supports JSON only: Elastic search does not have multi-language support in terms of handling request and response data (only possible in JSON) unlike in Apache Solr, where it is possible in CSV, XML and JSON formats.
Split brain situations: Elasticsearch also has a problem with Split brain situations but in rare cases. The split brain situation creates during cluster reformation. When one or more node fails in a cluster, the cluster reforms itself with the available nodes.
Installation process of Elasticsearch
Please follow below-given steps to install Elasticsearch.
Step 1: Press window+R and hit enter. It will open the command prompt in your windows operating system.
Step 2: Run below-given command to check the minimum version of Java installed on your computer, it should be Java 7 or more updated version. it will show your current version of java installed on your machine as given in below image.
Step 3: Download Elasticsearch from www.elastic.co ,for windows OS download ZIP file.
Step4: You just need to unzip the downloaded zip package.
Step 5: Now go to the Elasticsearch home directory. Inside the bin folder run the elasticsearch.bat file directly from the windows explorer or you can do the same using command prompt as given below two commands.
You will get an error. This is because you have not set java runtime environment path in environment variables to “C:\Program Files\Java\jre1.8.0_31” or the location where you installed java.
Alternatively, you can set java run-time environment path for temporarily using command give below image.
Repeat step 5 once you set java run-time path from the command prompt as given in above screen and you will see the output as seen in below image.
Step 5: You can access Elasticsearch API using http://localhost:9200, where Default port for web interface is 9200 or you can change it by changing HTTP port inside of elasticsearch.yml file present in the bin directory. You can check if the server is running up and browsing http://localhost:9200.
You will get a JSON object, which is nothing but information about the installed Elasticsearch as shown in below image.
That’s all you have successfully installed Elasticsearch on your machine.
CRUD operation in Elasticsearch
Elasticsearch is a REST API that you interact with over HTTP by sending certain URLs, and in some cases, HTTP bodies composed of JSON objects that you use to give commands to the cluster. So far we have tried GET requests only. It will be difficult If you work with POST commands, So we need a tool that will help us make these requests to the cluster.
For CRUD (create, read, update, delete) operations we have logical equivalents in HTTP methods (HttpPost, HttpGet, HttpPut and HttpDelete), which makes it easy to interact using standard methods. So instead of using the browser, we will use tools like Fiddler or postman which is very easy to make HttpPost, HttpGet, HttpPut and HttpDelete requests.
Fiddler: You can install fiddler2 from www.telerik.com/fiddler as a front end for your Elasticsearch.
Postman: You can install postman from https://chrome.google.com/webstore/detail/postman/fhbjgbiflinjbdggehcddcbncdddomop?hl=en as a front end for your Elasticsearch
Note: This version of the postman is for chrome you can download postman as per your requirements. For this tutorial, I will be using Postman for making API requests you can use any other tool.
Creating index using Elasticsearch REST API:-
An index is a collection of documents. It is similar to a database schema in the traditional database world. In the configure window of Postman, you can hit the address to add an index and if you want, then the type/mapping also using HTTP Put method, for example, −
You will get a JSON format message as shown in below screenshot.
Adding data to recently created index using REST API
We have created an index with the name countries now we will add a country to index countries using HttpPost composed of JSON object as shown in below image.
Note: Repeat this process to insert more country data.
Retrieving data from recently created index using REST API
To retrieve data from index we need to make a post request composed of JSON query object. the query will run on the server and return back the result please see below image for example.
As you can see in the above-highlighted box 1 we are posting a request to URL http://localhost:9200/countries/_search where countries are the recently created index and _search will be used internally by the Elasticsearch for a search operation.
Highlighted box 2 contains JSON object and that is going to compose with the request. This JSON object shows that query will get all the data from index countries.
Shown above example gets all country data in one go but we can retrieve the country record for a specific one. We need to add country id in the URL to get one country record as shown below.
As you can see in above-highlighted box1 we are making a get request through postman and passing id for country data ( http://localhost:9200/countries/country/1). Here 1 is the id for India similarly, we have id 2 for Pakistan, 3 for Nepal and so on. If you want to get data for Pakistan you can pass 2 at the End of the URL.
In this article, we have learned what Elasticsearch is and how does it work, what are the features and why we should use it. As well as, we have also learned about how to Insert the data and its further retrieval using REST APIs. In my upcoming articles, we will learn more about Elasticsearch REST APIs. Stay connected for more updates!