Configure flow control policies for your OpenSearch cluster in both the inbound and outbound directions, ensuring cluster stability by safeguarding against abnormal traffic.
Policy |
What It Does |
Details |
|---|---|---|
HTTP/HTTPS flow control |
Controls client access traffic using blacklists and whitelists, an upper limit on concurrent connections, and a rate limit on new connection attempts.
|
|
Memory flow control |
When the heap memory usage exceeds a pre-defined threshold (for example, 80%), the system stops receiving large requests, and garbage collection (GC) is triggered to reclaim memory. Write traffic is throttled by setting the backpressure factor (in_flight_factor) and the maximum delay for request handling (max). |
|
One-click traffic blocking |
When enabled, the system immediately disconnects all client connections, but not those used for OpenSearch Dashboards access or O&M and monitoring APIs, in order to restore the cluster. |
|
Request statistics sampling and analysis |
Records request metrics (such as bulk writes and queries) by client IP address, and exposes them via a statistics API to evaluate the cluster load and proactively identify abnormal traffic patterns. |
|
Access logging |
Records the URLs and bodies of HTTP/HTTPS requests for cluster load and client request analysis. Access logs can also be saved to files (that is, persisted to disk) to facilitate troubleshooting and performance analysis. |
Log in to OpenSearch Dashboards and go to the command execution page. OpenSearch clusters support multiple access methods. This topic uses OpenSearch Dashboards as an example to describe the operation procedures.
The left part of the console is the command input box, and the triangle icon in its upper-right corner is the execution button. The right part shows the execution result.
Control client access traffic using blacklists and whitelists, an upper limit on concurrent connections, and a rate limit on new connection attempts.
PUT /_cluster/settings
{
"persistent": {
"flowcontrol.http.enabled": true,
"flowcontrol.http.allow": ["192.168.0.1/24", "192.168.2.1/24"],
"flowcontrol.http.deny": "192.168.1.1/24",
"flowcontrol.http.concurrent": 1000,
"flowcontrol.http.newconnect": 1000,
"flowcontrol.http.warmup_period": 0
}
}
Parameter |
Type |
Description |
|---|---|---|
flowcontrol.http.enabled |
Boolean |
Whether to enable HTTP/HTTPS flow control. Enabling it may impact node performance.
|
flowcontrol.http.allow |
List<String> |
A whitelist of client IP addresses or CIDR blocks that are allowed to access the cluster, supporting:
The default is an empty list, indicating no whitelist. Setting this parameter to null restores the default value. |
flowcontrol.http.deny |
List<String> |
A blacklist of client IP addresses or CIDR blocks that are not allowed to access the cluster. A whitelist takes precedence over a blacklist. A blacklist supports the following:
The default is an empty list, indicating no blacklist. Setting this parameter to null restores the default value. |
flowcontrol.http.concurrent |
Integer |
Maximum number of concurrent HTTP/HTTPS connections that can be handled by each node per second. The default value is the number of available node vCPUs multiplied by 600. |
flowcontrol.http.newconnect |
Integer |
Maximum number of new HTTP/HTTPS connections that can be created per second per node. The default value is the number of available node vCPUs multiplied by 200. Setting this parameter to null restores the default value. |
flowcontrol.http.warmup_period |
Integer |
A grace period during which a system gradually ramp up from accepting zero HTTP/HTTPS requests to its full, maximum capacity. Value range: 0–10000 Unit: ms The default value is 0, indicating that there is no such warm-up period, and the maximum rate can be reached instantly. Setting this parameter to null restores the default value. For example, if flowcontrol.http.newconnect is set to 100 and flowcontrol.http.warmup_period is set to 5000ms, it takes 5 seconds for the system to reach 100 new connections per second. |
PUT /_cluster/settings
{
"persistent": {
"flowcontrol.http.enabled": false
}
}
Enable write throttling to mitigate the risk of OOM exceptions when the heap memory usage of a node exceeds a predefined threshold.
PUT /_cluster/settings
{
"persistent": {
"flowcontrol.memory.enabled": true,
"flowcontrol.memory.heap_limit": "80%"
}
}
Parameter |
Type |
Description |
|---|---|---|
flowcontrol.memory.enabled |
Boolean |
Whether to enable memory-based flow control. When enabled, node heap memory usage is monitored and a threshold is set, and writes are throttled when this threshold is reached.
|
flowcontrol.memory.heap_limit |
String |
Node heap memory usage threshold. When this threshold is exceeded, a write backpressure mechanism is triggered. Value range: 10%–100% The default value 90% of flowcontrol.memory.heap_limit is a conservative threshold. When the heap memory usage exceeds 90%, the system stops accepting client requests that exceed 64 KB, until heap memory usage decreases. Once the heap memory usage decreases to 85%, client data equivalent to 5% x maximum heap memory capacity can be read. If the heap memory usage stays above 90% for a long time, client requests cannot be processed. In this case, garbage collection is triggered until the heap memory usage drops below this threshold. Generally, you can set this threshold to 80% or less to ensure that cluster nodes have reserved some heap memory for operations besides data writing, such as segment merges. Setting this parameter to null restores the default value. |
flowcontrol.holding.in_flight_factor |
Float |
Backpressure factor, which controls the sensitivity of memory-based backpressure. A larger value indicates more powerful write throttling. Value range: ≥ 0.5 The default value is 1.0. Setting this parameter to null restores the default value. |
flowcontrol.holding.max |
TimeValue |
Maximum request handling delay allowed before requests are handled according to the policy defined by flowcontrol.holding.max_strategy. Value range: ≥ 15s Unit: second Default value: 60s Setting this parameter to null restores the default value. |
flowcontrol.holding.max_strategy |
String |
Handling policy for requests delayed longer than flowcontrol.holding.max.
Setting this parameter to null restores the default value. |
flowcontrol.memory.once_free_max |
String |
Maximum memory that can be made available at a time for a re-enabled request queue. This parameter can be configured to prevent cluster overload caused by a flood of incoming requests. Value range: 1%–50% The default value is 5%. Setting this parameter to null restores the default value. |
flowcontrol.memory.nudges_gc |
Boolean |
Whether to trigger garbage collection (GC) to reclaim memory when the write pressure is too high. (The backpressure connection pool is checked every second. The write pressure is considered high if all existing connections are blocked and new write requests cannot be accepted.)
Setting this parameter to null restores the default value. |
PUT /_cluster/settings
{
"persistent": {
"flowcontrol.memory.enabled": false
}
}
In case of emergencies, the system immediately disconnects all client connections (excluding those used for OpenSearch Dashboards access or O&M and monitoring APIs) to restore clusters.
PUT /_cluster/settings
{
"persistent": {
"flowcontrol.break.enabled": true
}
}
Parameter |
Type |
Description |
|---|---|---|
flowcontrol.break.enabled |
Boolean |
Whether to enable one-click traffic blocking (similar to a circuit breaker). When enabled, the system immediately disconnects all client connections, but not those used for OpenSearch Dashboards access or O&M and monitoring APIs.
|
PUT /_cluster/settings
{
"persistent": {
"flowcontrol.break.enabled": false
}
}
Collect request metrics by client IP address to help identify abnormal traffic patterns.
PUT _cluster/settings
{
"transient": {
"flowcontrol.log.access.enabled": true
}
}
Parameter |
Type |
Description |
|---|---|---|
flowcontrol.log.access.enabled |
Boolean |
Whether to enable request statistics sampling, that is, whether to collect request metrics (such as bulk writes and search/msearch requests) by client IP address.
|
flowcontrol.log.access.count |
Integer |
Maximum number of client IP addresses sampled. Value range: 0–100 Default value: 10 Setting this parameter to null restores the default value. |
GET /_nodes/stats/filter/v2
GET /_nodes/stats/filter/v2?detail
GET /_nodes/{nodeId}/stats/filter/v2
{nodeId} indicates the node ID.
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "css-xxxx",
"nodes" : {
"d3qnVIpPTtSoadkV0LQEkA" : {
"name" : "css-xxxx-ess-esn-1-1",
"host" : "192.168.x.x",
"timestamp" : 1672236425112,
"flow_control" : {
"http" : {
"current_connect" : 52,
"rejected_concurrent" : 0,
"rejected_rate" : 0,
"rejected_black" : 0,
"rejected_breaker" : 0
},
"access_items" : [
{
"remote_address" : "10.0.0.x",
"search_count" : 0,
"bulk_count" : 0,
"other_count" : 4
}
],
"holding_requests" : 0
}
}
}
}
Parameter |
Description |
|---|---|
current_connect |
Number of HTTP connections to a node, which is recorded regardless of whether flow control is enabled. This value is equivalent to the current_open value of GET /_nodes/stats/http API. It shows the current client connections of each node. |
rejected_concurrent |
Number of concurrent connections rejected during flow control. This metric is available only when flowcontrol.http.enabled is set to true. The count will not be cleared when flow control is disabled. |
rejected_rate |
Number of new connections rejected during flow control. This metric is available only when flowcontrol.http.enabled is set to true. The count will not be cleared when flow control is disabled. |
rejected_black |
Number of new connections rejected by a preconfigured blacklist during flow control. This metric is available only when flowcontrol.http.enabled is set to true. The count will not be cleared when flow control is disabled. |
rejected_breaker |
Number of new connections rejected during one-click traffic blocking. This metric is available only when flowcontrol.break.enabled is set to true. The count will not be cleared when one-click traffic blocking is disabled. |
access_items |
IP addresses of clients that recently accessed the cluster. The number of client IP addresses sampled is determined by flowcontrol.log.access.count. |
remote_address |
IP addresses and the number of requests. |
search_count |
Number of times a client accessed a database using _search and _msearch. |
bulk_count |
Number of times a client accessed a database using _bulk. |
other_count |
Number of times a client accessed a database using other request methods. |
holding_requests |
Number of connections to the current node where writes are halted due to flow control. |
PUT /_cluster/settings
{
"persistent": {
"flowcontrol.log.access.enabled": false
}
}
When access logging is enabled, the system records the URLs and bodies of HTTP/HTTPS requests for cluster load and request analysis.
PUT /_access_log?duration_limit=30s&capacity_limit=1mb
PUT /_access_log/{nodeId}?duration_limit=30s&capacity_limit=1mb
{nodeId} indicates the node ID.
Parameter |
Type |
Description |
|---|---|---|
duration_limit |
String |
Maximum duration of access log records. When this duration is reached, access logging stops. Value range: 10 to 120 Unit: s The default value is 30. Setting this parameter to null restores the default value. Access logging stops when either duration_limit or capacity_limit is reached. |
capacity_limit |
String |
Maximum memory capacity for recording access logs. When the size of an access log reaches this limit, access logging stops. Value range: 1 to 5 Unit: MB The default value is 1. Setting this parameter to null restores the default value. Access logging stops when either duration_limit or capacity_limit is reached. |
GET /_access_log
GET /_access_log/{nodeId}
{nodeId} indicates the node ID.
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "css-flowcontroller",
"nodes" : {
"8x-ZHu-wTemBQwpcGivFKg" : {
"name" : "css-flowcontroller-ess-esn-1-1",
"host" : "10.0.0.98",
"count" : 2,
"access" : [
{
"time" : "2021-02-23 02:09:50",
"remote_address" : "/10.0.0.98:28191",
"url" : "/_access/security/log?pretty",
"method" : "GET",
"content" : ""
},
{
"time" : "2021-02-23 02:09:52",
"remote_address" : "/10.0.0.98:28193",
"url" : "/_access/security/log?pretty",
"method" : "GET",
"content" : ""
}
]
}
}
}
Parameter |
Description |
|---|---|
name |
Node name |
host |
Node IP address |
count |
Number of node access requests in a statistical period |
access |
Details about node access requests in a statistical period |
time |
Request time |
remote_address |
Source IP address and port number in the request |
url |
Original URL of the request |
method |
Request method |
content |
Request content. If the value is an empty string (""), there is no request body. |
Access logs can be persisted to disk for troubleshooting and analysis. Use this function sparingly, as it can impact cluster performance. Remember to disable it immediately after resolving the issue.
PUT /_cluster/settings
{
"persistent": {
"flowcontrol.log.file.enabled": true
}
}
Parameter |
Type |
Description |
|---|---|---|
flowcontrol.log.file.enabled |
Boolean |
Whether to record access logs in files. When enabled, the log of each access request is recorded in files. The log file name is Cluster name_access_log.log. You can check this file only through the log backup function.
|
PUT /_cluster/settings
{
"persistent": {
"flowcontrol.log.file.enabled": false
}
}