Thoughts on Elasticsearch cluster configuration #365

thepsalmist · 2025-02-12T12:55:42Z

As we prepare to configure the new server cluster, these are some thoughts/questions

Our current ES setup had 3 nodes, with each node serving all eligible roles (master, data, ingest, coordinating). As we increase the number of nodes, we need to have dedicated master-eligible nodes (master only role).
ES recommends at leats three master-eligible nodes for HA
How is the current resource configuration of the nodes. Do we need to have bigger data nodes, and slightly smaller master eligible nodes. Data nodes handle all data related CRUD operations. From the ES audit, we established our use case does not specific data tiers, so we'll have a generic data node.
Explicit declaration of a co-ordinating only node. Cordinating nodes acts as our smart load balancers. Coordinating only nodes can benefit large clusters by offloading the coordinating node role from data and master-eligible node.
We can however use the master eligible nodes also as coordinating nodes, though ES advises against this configuration.

The text was updated successfully, but these errors were encountered:

thepsalmist · 2025-02-12T15:38:53Z

Adding the links from the call here,

https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html#shard-size-recommendation

https://aws.amazon.com/blogs/database/get-started-with-amazon-elasticsearch-service-use-dedicated-master-instances-to-improve-cluster-stability/

pgulley · 2025-02-14T19:58:13Z

The way I'm thinking now, there's three big questions to answer and we kind of have to answer them in sequence.

Index mapping: We're already mostly settled here, I think. The biggest conclusion is that we should disable fielddata everywhere- but there are some other open questions from If we ever reindex... #344, Specifically around keyword fields and date representation.
Shard size: This is a value that there's an empirical process for determining, so we just have to follow that process. I imagine our current shard size would be a fine starting place, and the question is if we want to shrink that value at all.
Dedicated Master Nodes: The advice online is pretty clearly in favor of dedicated master nodes for our scale. I think while we're testing the previous 2 questions we can continue having all nodes serve all roles and keep an eye on how things scale. If we determine that we need dedicated master nodes, we need to think about how to host them- certainly all the other machines in the Angwin cluster are oversized for the task, and the eight new nodes were spec'd as Data Nodes. Would it be absurd to host the master nodes on machines with other tenants or is it essential that the master machines serve no other purpose? We could just set the three data nodes we've already got as part-time masters.

I think for now we leave the question of coordinating nodes for future optimization- that document linked says "The benefit of coordinating only nodes should not be overstated — data nodes can happily serve the same purpose."

philbudne · 2025-02-15T06:14:30Z

2. Shard size

I'm wondering more about shard count than size. Having fewer shards would mean turning over ILM indices faster, which would mean fewer shards would need to be searched for searches that don't go back far in time. There may be an advantage to copying the existing ILM indices to the new server while keeping their shard counts: I'm concerned that if it's possible to reindex an index with many shards into many ILM indices with fewer shards, we might end up with indexed dates being distributed into new indices in troubling ways...

3. Dedicated Master Nodes: The advice online is pretty clearly in favor of dedicated master nodes for our scale.

My memory (which could be wrong) was the consultants had suggested having a set of master eligible nodes, and after that to consider coordinating nodes. I don't remember them suggesting dedicated masters. Regarding scripts: I had imagined that we would use ansible to configure the servers, and that we would generate configuration files using jinja2 templates. Looking ahead to thinking about how to do the switchover: It's my impression that you can't reindex from an index that's currently being written. important questions: How long it will take to move each ILM index? How much impact will that have on the running cluster? Right now I'm thinking the best case is that it takes a day or two, in which case we can shut down the indexer pipeline on a friday, start the copy on the last/current index, and then resume operations on monday on the new cluster. If it takes longer, we might need to look at forcing the start of a new ILM index, copying the (now) previous index, and then shutting down the pipeline and copying the new, smaller index...

pgulley · 2025-02-18T22:14:09Z

It's always possible to change the node type distribution down the line if we find our configuration needs adjusting. Three master-eligible nodes and five pure data nodes seems like a fine way to approach this to start.

What are the considerations that are blocking us from setting up a shard-size experiment tomorrow, if we want to move on iterating asap?

Also wondering how we would benchmark the reindex time- can we use the reindex api when bringing up test indexes, and keep a timer running so we have a sense of the time cost per story? Is that too simple an approach?

philbudne · 2025-02-18T22:54:26Z

Also wondering how we would benchmark the reindex time- can we use the reindex api when bringing up test indexes, and keep a timer running so we have a sense of the time cost per story? Is that too simple an approach?

My experience in the past is that you can get a reasonable idea of how things are going by doing: ``` curl -s http://SERVER:9200/_cat/indices?v=true&h=index,pri.store.size,docs.count sleep 60 curl -s http://SERVER:9200/_cat/indices?v=true&h=index,pri.store.size,docs.count ``` which shows (for each index): index name, primary storage in use (in crude terms, not counting replica shards), and a document count. Then I'd just subtract the document counts, or wait longer for a better average... A lot of the time I'd just get numbers off the graphs.. (To get visuals, we could hand-run the indexer elastic-stats.py script on just one node, with some new realm name) More thoughts: If we're not using ansible, I'd like to have the ES install script (or a parent top level script) also install the statsd-agent which reports server stats to statsd/graphite/grafana. To install statsd-agent: In a director with a checkout of mediacloud/system-dev-ops cd to monitoring/statsd-agent and run "make install" (as root). It would probably be safest to have the top level script run the ES install script with command line arguments based on the hostname (have the roles for each node wired into the install script: ``` DATA_NODE=Y case $(hostname) in A|B|C) MASTER_CANDIDATE=Y;; esac ``` It should be possible to run the script(s) multiple times, and they should test to see if any change is needed at each step before changing anything. This is the thing ansible is REALLY good at (otherwise I found it a pain, and it always took me at least twice as long as writing a shell script would have, but I'm fluent in shell scripting). I hadn't assumed the ES servers would be part of the angwin cluster (have user logins, NFS mounted home directories). In ancient times is was relatively easy to have things get gummed up because of these dependencies (depending on external services, especially during restart), but I have a cat-like memory for past pain...

thepsalmist · 2025-02-19T14:33:06Z

If we're not using ansible, I'd like to have the ES install script (or
a parent top level script) also install the statsd-agent which reports
server stats to statsd/graphite/grafana.

I would also propose doing this using Ansible, since having the scripts in the story-indexer repo works, however this means to install this we'd probably need to execute this in every host (8 in this case??), while with Ansible this could be executed from a single agent.

pgulley · 2025-02-19T14:55:36Z

I had always assumed we would want to use Ansible!

Here's a testing framework for cluster optimization, surprised it hadn't come up before: https://github.com/elastic/rally

rahulbot · 2025-02-19T15:06:41Z

Legacy note: we used Ansible effectively in the old system to spin up machines with just a quick command. This turned out to be very useful from my perspective, even if it seemed like there was a learning curve to using it.

pgulley · 2025-02-20T17:11:01Z

Re: Elastic Rally- it looks like it's possible to make "test tracks" from existing cluster data for new clusters: https://esrally.readthedocs.io/en/stable/adding_tracks.html#creating-a-track-from-data-in-an-existing-cluster

Rally won't give us the ability to test bare metal configuration details like the JVM Heap size, but I think we already know the right values to use there. Shard size should be doable though, if we want to identify 4/8 sample shard sizes we can automate a test and just leave it running for a couple days .

thepsalmist added elasticsearch infrastructure labels Feb 12, 2025

thepsalmist assigned thepsalmist, kilemensi and philbudne Feb 12, 2025

philbudne mentioned this issue Feb 19, 2025

Investigate elasticsearch configuration using ansible #366

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts on Elasticsearch cluster configuration #365

Thoughts on Elasticsearch cluster configuration #365

thepsalmist commented Feb 12, 2025 •

edited

Loading

thepsalmist commented Feb 12, 2025

pgulley commented Feb 14, 2025

philbudne commented Feb 15, 2025 via email

pgulley commented Feb 18, 2025

philbudne commented Feb 18, 2025 via email

thepsalmist commented Feb 19, 2025

pgulley commented Feb 19, 2025

rahulbot commented Feb 19, 2025

pgulley commented Feb 20, 2025

Thoughts on Elasticsearch cluster configuration #365

Thoughts on Elasticsearch cluster configuration #365

Comments

thepsalmist commented Feb 12, 2025 • edited Loading

thepsalmist commented Feb 12, 2025

pgulley commented Feb 14, 2025

philbudne commented Feb 15, 2025 via email

pgulley commented Feb 18, 2025

philbudne commented Feb 18, 2025 via email

thepsalmist commented Feb 19, 2025

pgulley commented Feb 19, 2025

rahulbot commented Feb 19, 2025

pgulley commented Feb 20, 2025

thepsalmist commented Feb 12, 2025 •

edited

Loading