Get started with Elasticsearch

ยท 932 words ยท 5 minute read

Elasticsearch is a search engine based on the open source library Lucene. It really shines on full-text search but also supports other types of search: geo search, metric search, etc.

Elasticsearch vs Elastic vs ELK ๐Ÿ”—

Before getting into the details of Elasticsearch and setting up a single node cluster for testing, let’s clarify a few acronyms and names.

Elasticsearch, the search engine, is a product that gave its name to the company Elasticsearch. In 2015, the company renamed from Elasticsearch to Elastic to clarify the broader range of products that the company started to offer. Indeed, the company started early on exploring problems beyond search. The core products of the Elastic company form the ELK stack.

  • E for Elasticsearch, the search engine
  • L for Logstash, used for cleansing and enriching data (typically logs)
  • K for Kibana, the main visualisation and dashboarding tool for Elasticsearch indexed data

Easy, right? … But Elastic now talks about the Elastic stack as they started adding products to their core stack:

  • XPack which includes many functionalities, security, monitoring, alerting, etc.
  • Beats, lightweight data shippers for gathering and shipping data (e.g. Heartbeat for uptime monitoring)

And we can expect more products and services to be added in the future.

You can use Elasticsearch without using any other product or service in the Elastic stack but they work very well together.

A common use case for the ELK stack is the indexing of logs data. The stack is usually implemented as follows:

  1. Logstash: cleanse the logs
  2. Elasticsearch: index the logs for search
  3. Kibana: visualisations and dashboarding

Elasticsearch terminology ๐Ÿ”—

There are a few simple concepts that you’ll need to master when talking about Elasticsearch. Some of these are not specific to Elasticsearch but we’ll do a refresher anyway.

Cluster ๐Ÿ”—

This is a group of one or more nodes (we’ll talk about that soon) identified by a unique name. It allows federated indexing and searching across all nodes.

Node ๐Ÿ”—

This is a single server - identified by a unique name - in the cluster.

Index ๐Ÿ”—

This is a collection of documents. A search generally happens over a single index.

Document ๐Ÿ”—

This is the basic unit of information, in JSON format. If the index is the database table, the document is the row.

Mapping ๐Ÿ”—

If we continue with the RDBMS analogy, this is your database schema. By default, it is dynamic: if you don’t define any static mapping, Elasticsearch will define the mappings for you. You can have a mix of static and dynamic mappins within the same index.

Shards ๐Ÿ”—

This is a piece of an index that horizontally splits an index for scalability or replication. The replicas shards are never on the same node as the primary shard for fault tolerance.

Elasticsearch installation ๐Ÿ”—

OK, now that we have clarified a few things about the Elastic stack and Elasticsearch terminology, let’s try to install a single node Elasticsearch cluster with Kibana.

The easiest way to play with Elasticsearch and install it is using the Docker image available on Dockerhub and that’s what we are going to demonstrate today.

If you want to go a bit further, you can explore some of the key settings at the following links

Let’s use a docker-compose.yaml file (if you are not too familiar with docker-compose you can check our intro post on docker-compose with Django).

version: "3"
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.9.3
    ports:
      - 9200:9200
      - 9300:9300
    volumes:
      - /usr/share/elasticsearch
    environment:
      - discovery.type=single-node
  kibana:
    image: docker.elastic.co/kibana/kibana:7.9.3
    depends_on:
      - elasticsearch
    ports:
      - 5601:5601

Once you have copied the above on a local docker-compose.yaml file, you can run

docker-compose up

It will take a bit of time to pull Elasticsearch and Kibana images. Once that’s all done, you’ll have a single node elasticsearch cluster running in docker.

Elasticsearch exposes a few endpoints to make your life easier. You can start by checking the details of your cluster at http://localhost:9200/.

If you want to explore further, a good starting point is the _cat endpoint available at http://localhost:9200/_cat. It lists the many endpoints available to explore your Elasticsearch cluster.

For instance, http://localhost:9200/_cat/health tells you a bit more about your cluster health. If you check it now, your cluster should be in yellow state because you only have one node in your cluster. We’ll discuss the cluster states in a bit more details later on.

Another useful endpoint is http://localhost:9200/_cat/indices which lists your Elasticsearch indices. There should be no index in the list just yet. Let’s create one to see how to do that.

Creating your first Elasticsearch index and your first document ๐Ÿ”—

You can run the following to create your first Elasticsearch index

curl -X PUT http://localhost:9200/my-index-0

You should get a json response similar to the following response

{"acknowledged":true,"shards_acknowledged":true,"index":"my-index-0"}

And if you inspect the _cat/indices endpoint again, your new index should be listed. You can inspect the index documents at http://localhost:9200/my-index-0/_search. You’ll notice that hits["hits"] is an empty list because we haven’t indexed any document yet.

Let’s index a document now

curl -X POST http://localhost:9200/my-index-0/_doc/1 -d '{"user": "test"}' -H 'Content-Type: application/json'

The response should now look like the following

{"_index":"my-index-0","_type":"_doc","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":0,"_primary_term":1}

confirming that you’ve successfully indexed your first document in Elasticsearch. http://localhost:9200/my-index-0/_search now shows 1 result where the details are nested under the _source object.

The document can also be inspected in more details at http://localhost:9200/my-index-0/_doc/1.

Conclusion ๐Ÿ”—

Phew. We’ve now setup Elasticsearch in Docker and created a first index and POSTed a first document. We’ll stop here for now and I’ll see you again in a new post about the Elasticsearch Python client.

References ๐Ÿ”—