Configuring Large Query Isolation for an Elasticsearch Cluster

Scenario

You can isolate query requests that consume a large amount of memory or take a long period of time. This way, you ensure service availability for other requests. If the heap memory usage of a node is too high, an interrupt control program will be triggered to terminate a large query based on the policies you configured. You can also configure a global query timeout duration. Long queries will be intercepted by an Elasticsearch-native cancel API.

Large query isolation can effectively solve the following problems and improve the search performance of clusters:
  • A small number of queries occupy large chunks of node heap memory, resulting in frequent Garbage Collection (GC) and even out of memory (OOM) exceptions.
  • Frequent GC causes node disconnections. As a result, queries cannot get response and may fail.
  • The CPU usage is high due to heavy query load, affecting online services.

Constraints

Only Elasticsearch 7.6.2 and Elasticsearch 7.10.2 clusters support large query isolation.

Logging In to Kibana

Log in to Kibana and go to the command execution page. Elasticsearch clusters support multiple access methods. This topic uses Kibana as an example to describe the operation procedures.

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > Elasticsearch.
  3. In the cluster list, find the target cluster, and click Kibana in the Operation column to log in to the Kibana console.
  4. In the left navigation pane, choose Dev Tools.

    The left part of the console is the command input box, and the triangle icon in its upper-right corner is the execution button. The right part shows the execution result.

Enabling Large Query Isolation

Large query isolation is enabled by default, while global query timeout is disabled by default. If you enable them, the configuration will take effect immediately.

Run the following commands to enable large query isolation and global query timeout:
PUT _cluster/settings
{
  "persistent": {
    "search.isolator.enabled": true,
    "search.isolator.time.enabled": true
  }
}

The two features each has an independent switch. Table 1 describes their parameters.

Table 1 Parameters for configuring large query isolation and global query timeout

Switch

Parameter

Description

search.isolator.enabled

search.isolator.memory.task.limit

search.isolator.time.management

Thresholds for identifying a single shard query task as a large query.

search.isolator.memory.pool.limit

search.isolator.memory.heap.limit

search.isolator.count.limit

Resource usage thresholds for isolation. If the resource usage of a query task exceeds one of these thresholds, the task will be paused.

NOTE:

search.isolator.memory.heap.limit defines the limit on the heap memory consumed by write, query, and other operations of a node. If this limit is exceeded, large query tasks in the isolation pool will be paused.

search.isolator.strategy

search.isolator.strategy.ratio

Policy for selecting query tasks to pause in the isolation pool.

search.isolator.time.enabled

search.isolator.time.limit

Global timeout for query tasks.

Configuring Large Query Isolation Thresholds

Configuring the Global Query Timeout

Run the following command to set the global timeout of query tasks:
PUT _cluster/settings
{
  "persistent": {
    "search.isolator.time.limit": "120s"
  }
}
Table 5 Description

Parameter

Type

Description

search.isolator.time.limit

String

Global query timeout duration. Any query task that exceeds this duration will be canceled.

  • Value range: ≥ 0ms
  • Default value: 120s

Configuring the Maximum Number of Log Records for Canceled Query Requests

Run the following command to set the maximum number of log records kept for canceled query requests:
PUT _cluster/settings
{
  "persistent": {
    "search.isolator.log.count": "100"
  }
}

Parameter

Data Type

Description

search.isolator.log.count

Integer

Maximum number of records of canceled query requests that can be recorded in the memory.

  • Value range: 0–5000
  • Default value: 100
NOTE:

You can use the following APIs to query canceled requests:

  • GET /_isolator_metrics: Queries all nodes.
  • GET /_isolator_metrics/{nodeId}: Queries a single node.
  • GET /_isolator_metrics? detailed: Queries request cancellation details of all nodes.
  • GET /_isolator_metrics/{nodeId}?detailed: Queries request cancellation details of a single node.

In the commands above, nodeId indicates the node ID.