My ElasticSearch Docker setup for Magento 2

With Magento 2.4, having ElasticSearch up and running became a requirement. In production, but also in development. For this, using Docker is great. However, I always am looking for cranking up the speed of Docker images locally, to allow for the best developer experience. Here's what I did with my Docker setup for Magento 2.

Basics first: Hello Docker, hello ElasticSearch

Let's deal with the basics first: You could run ElasticSearch natively as well. However, I'm playing around so much with different platforms with different requirements, so always find myself running different versions of ElasticSearch (5, 6, 7), so Docker is simply making it easier.

For instance, to fire up an ElasticSearch 6 instance, you could use the following:

docker run -rm -it -d -p 9200:9200 elasticsearch:6.8.0

For myself, I'm always adding the flags -it and -d to allow for interaction but to run the container in the background by default. The port forwarding allows for connecting to http://localhost:9200 (the main port for ElasticSearch, don't bother about port 9300 unless you're happy to run an ElasticSearch locally on your laptop).

And my personal preference is to add the flag -rm, which is both convenient and annoying: The convenient part is that I wake up my PC every morning with no running Docker instances. And when I need to run ElasticSearch via Docker, I'm getting a clean instance, so that I don't find myself running out of memory or disk space after a month or so. The downside is that I need to run bin/magento indexer:reindex to fill the ES database with Magento 2 data. That's why we have cron.

I consider these the basics.

Scaling for my personal resources

Additional flags can be added to the Docker run command to allow for more resources to be used. I've got a monster of a computer, so I can waste resources easily. This is the setup I'm using:

--cpus=4 --memory=2G --memory-swap=0 --memory-swappiness=0

Note that if you're lowering the memory, you might need to re-enable swap at some point. Now, adding memory to ES is great for performance. But adding memory to the Docker instance does not necessarily mean that you're adding it to the ES processes. For this, we need to add some environment variables that are going to be picked up by the Java program:

ES_JAVA_OPTS="-Xms512m -Xmx512m -server" ES_HEAP_SIZE=512m

With the ES_JAVA_OPTS options, the minimum (Xms) and maximum (Xmx) sizes for the Java Virtual Machine could be configured. It shouldn't be more than 50% of your available memory and I found 512M to be good enough. The ES_HEAP_SIZE is actually doing the same. The flag -server seems deprecated but in some environments it still leads to a smarter Java memory allocation, so why not included it.

tmpfs

Another great tip is to use tmpfs. In the past, I already wrote about how to do this for MySQL. With ES, we first need to reconfigure the paths, so we know what to tune:

-e "path.data=/opt/elasticsearch/volatile/data" -e "path.logs=/opt/elasticsearch/volatile/logs"

Next, this new path could be mapped (together with /tmp) to use the in-memory tmpfs filesystem:

--tmpfs /opt/elasticsearch/volatile/data:rw \
--tmpfs /opt/elasticsearch/volatile/logs:rw \
--tmpfs /tmp:rw

Perhaps because of this, the --rm option is actually not needed, because I'm wiping out the data and logs anyway with every stop/server, but anyway it works.

Disable clustering

Earlier I mentioned already that port 9200 it mapped locally, so we connect to http://localhost:9200 from Magento or other applications. But port 9300 is not needed unless you want to do clustering. Clustering is actually one of the more important features of ElasticSearch. But locally, that feature is not needed. By disabling the node discovery, we tell ES to stop searching for new cluster options and gain a little bit more performance:

-e "discovery.type=single-node"

However, once you start inspecting the health of the ES node, it will always be in the orange and not the green. The reason for this is that both primary shards and replica shards are present on the same node (the Docker instance) which is causing fail-overs or backups to stop working. Obviously, because we don't want to cluster.

To tell ES that the status of the cluster is to be healthy even though we don't have backup nodes available, the number of primary shards could be set to 1 and the number of replicas to 0. We can use the ES JSON API for this:

curl -XPOST 'http://localhost:9200/_template/default' -H 'Content-Type: application/json' \
    -d '{
  "index_patterns": ["*"],
  "order": -1,
  "settings": {
    "number_of_shards": "1",
    "number_of_replicas": "0"
  }
}

Right after resetting these defaults, any newly created index is in the green. The only thing is that you can't fire the CURL command right after starting the Docker instance because the ES process will not be ready yet. And you do want to make sure the command is run, before indexing from Magento. So I usually do this with a 20s delay. Theoretically, you could use some clever loop to check upon the ES status differently.

Summary

All in all, the tuning simply requires a bunch of flags to be added to the docker run command. In my case, it kind of looks like the following:

docker run \
    --rm -it -d \
    -p 9206:9200 \
    -e "discovery.type=single-node" \
    -e "path.data=/opt/elasticsearch/volatile/data" \
    -e "path.logs=/opt/elasticsearch/volatile/logs" \
    -e "ES_JAVA_OPTS=-Xms512m -Xmx512m -server" \
    -e "ES_HEAP_SIZE=512m" \
    -e "MAX_LOCKED_MEMORY=100000" \
    --cpus=4 \
    --memory=2G \
    --memory-swappiness=0 \
    --memory-swap=0 \
    --tmpfs /opt/elasticsearch/volatile/data:rw \
    --tmpfs /opt/elasticsearch/volatile/logs:rw \
    --tmpfs /tmp:rw \
    elasticsearch:6.8.0

In my personal setup, I'm actually also adding a name (--name) and manual networking with fixed IPs, because of me running some services locally (Apache) while others are remote (PHP-FPM, Varnish, ES). This is left out for the sake of the primary of this article: Tuning the heck out of ElasticSearch in Docker.

Hope you find this useful. And do share your thoughts below! :)

Yireo - Training

Basics first: Hello Docker, hello ElasticSearch

Scaling for my personal resources

tmpfs

Disable clustering

Summary

About the author

Sponsor Yireo

Join our newsletter

Upcoming events

Latest video lessons

Looking for a training in-house?

Let's get to it!

Do not miss out on what we say

This will be the most interesting spam you have ever read

Main

Education

About Yireo

Platforms

Projects

GitHub organizations

Socials

Legal docs

Need help with choosing?

Nederlandse taal?

Basics first: Hello Docker, hello ElasticSearch

Scaling for my personal resources

tmpfs

Disable clustering

Summary

About the author

Sponsor Yireo

Join our newsletter

Upcoming events

Latest video lessons

Looking for a training in-house?

Let's get to it!

Do not miss out on what we say

This will be the most interesting spam you have ever read

Main

Education

About Yireo

Platforms

Projects

GitHub organizations

Socials

Legal docs

We use harmless cookies

Need help with choosing?

Nederlandse taal?

Our cookies do no harm. But we do need to ask