What is Elasticsearch?

It is a highly scalable search engine

It is a database companion (not a database)

It can be hosted with GCloud/AWS

It is a flexible solution for a difficult problem -
how do we quickly and efficiently search huge amounts of data?

Why Elasticsearch?

It handles all of the indexing and searching, we just give it an index

We can protect it and scale it up using GCloud/AWS

All of the heavy-lifting work is done by someone else - no load on our servers

It supports autocomplete and term highlighting out of the box

Your index and queries can be as simple or as complicated as you want them to be

How does it work?

Elasticsearch uses the Lucene search language

It exposes HTTP endpoints for you to GET, POST, PUT and DELETE

As soon as you've created a new instance of Elasticsearch you can start publishing

Alternatively you can define your indexes at the start

Which means you can customise how it analyses and indexes your data

How does it work?

An Outdate Analogy


This analogy assumes “index” being similar to a “database” in an SQL database,
and a “type” being equivalent to a “table”.

MySQL/RDBMS Elastic Search
Database Index
Table Type (deprecated)
Row Document

This was a bad analogy that led to incorrect assumptions.
In an SQL database, tables are independent of each other.
The columns in one table have no bearing on columns with the same name in another table.
This is not the case for fields in a mapping type.


How does it work?



Shards and Replicas As the Foundation of Elasticsearch

Each Elasticsearch shard is an Apache Lucene index,
with each individual Lucene index containing a subset of the documents in the Elasticsearch index

Having the right number of shards is important for performance.
It is thus wise to plan in advance.
When queries are run across different shards in parallel, they execute faster than an index composed of a single shard,
but only if each shard is located on a different node and there are sufficient nodes in the cluster.

Show me some code

When creating an index, you can set the number of shards and replicas as properties of the index

When an index is created, the number of shards is set,
and this cannot be changed later without reindexing the data.

PUT /some_index

              'settings' : {

                  'index' : {

                      'number_of_shards' : 6,

                      'number_of_replicas' : 2




Show me some code


        from datetime import datetime
        from elasticsearch import Elasticsearch
        es = Elasticsearch()

        myDoc = [{
                'Id': 1,
                'FirstName': 'Kemaru',
                'LastName': 'Young',
                'timestamp': datetime.now()

                'Id': 2,
                'FirstName': 'Jane',
                'LastName': 'Doe',
                'timestamp': datetime.now()


        options = {
          'settings' : {
                 'index' : {
          'data': myDoc

        res = es.index(index='test-index', id=1, body=options)

        res = es.get(index='test-index', id=1)


        res = es.search(index='test-index', body={'query': {'match_all': {}}})
        print('Got %d Hits:' % res['hits']['total']['value'])
        for hit in res['hits']['hits']:
            print('%(timestamp)s %(author)s: %(text)s' % hit['_source'])

Show me some code


        var myDocs = [
                Id: 1,
                FirstName: 'Kemaru',
                LastName: 'Young',
                'timestamp': Date.now()

                Id: 2,
                FirstName: 'Jane',
                LastName: 'Doe',
                'timestamp': Date.now()

        myDocs = formatForBulkUpdate(myDocs);

        var options = {
            url: 'https://test_host.com/users/_bulk',
            method: 'POST',
            proxy: 'https://test_host.com',
            data: myDocs

        //if you are using GCloud, check this document: https://console.developers.google.com/project/_/apiui/credential

        //if you are using AWS, check this document: https://www.npmjs.com/package/aws4' 
        aws4.sign(options, {
            accessKeyId: 'some_access_key_id',
            secretAccessKey: 'some_secret_access_key'

        request(options, function() {
            console.log('It works!');

Show me some code

Node.js (searching)

        var mySearch = {
            'query': {
                'multi_match' : {
                    'fields' : ['FirstName', 'LastName'],
                    'query' : 'Ke',
                    'type' : 'phrase_prefix'
        var options = {
            url: 'https://test_host.com/users/_search',
            method: 'POST',
            proxy: 'https://test_host.com',
            data: mySearch

        aws4.sign(options, {
            accessKeyId: 'some_access_key_id',
            secretAccessKey: 'some_secret_access_key'

        request(options, function(results) {
            results.forEach(function(result) {
                console.log(result.FirstName + ' ' + result.LastName);

Show me some code

C# (updating)

          var requestBody = formatForBulkUpdate(myDocs);
          var request = (HttpWebRequest) WebRequest.Create('https://test_host.com/users/_bulk');
          request.Method = 'POST';
          request.ContentType = 'text/plain';

          var requestBodyBytes = Encoding.ASCII.GetBytes(requestBody.ToString());

          var creds = new Credentials()
              AccessKey = 'some_access_key',
              SecretKey = 'some_secret_key'

          var signer = new SignV4Util();
          signer.SignRequest(request, requestBodyBytes, creds, region, serviceName);

          using (var stream = request.GetRequestStream())
              stream.Write(requestBodyBytes, 0, requestBodyBytes.Length);

Where are we using Elasticsearch?

Web application enriched with data

When the user types a query in the search input it uses AJAX to query the endpoint (after a short delay)

Services that update such data on a timely schedule

Cron & Logstash



Check your java version, need one of these:

  • Java 8, or
  • Java 11, or
  • Java 14

Make sure JAVA_HOME environment variable is set





      input { stdin { } }
      output {
        elasticsearch { hosts => ['localhost:9200'] }
        stdout { codec => rubydebug }

run logstash and specify the configuration file with the -f flag

bin/logstash -f logstash-simple.conf

Python logging with Logstash

python-logstash-async, or



Only allowing specific GCloud/AWS users to update the index, or

Go through an API gateway or Lambda functions hosted by GCloud/AWS

Want to learn more?

Run Elastic on the GCloud for free (trial) https://www.elastic.co/cloud/

(Nearly) everything about Elasticsearch has been documented here: https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Go to elasticsearch Github: https://github.com/elastic/elasticsearch

If you have any questions please let me know

Thanks for watching :D

End of presentation

