Creating an OpenSearch Vector Cluster

Integrating efficient indexing techniques, the CSS vector database delivers a high-performance, low-cost, scalable solution for high-dimensional vector search. An OpenSearch vector cluster converts unstructured data into high-dimensional vectors and uses vector indexing algorithms (such as HNSW graph-based indexing and product quantization) to enable approximate nearest neighbor (ANN) search. This significantly reduces computational complexity while ensuring a high recall rate.

This topic focus on the memory capacity requirements and planning for an OpenSearch vector cluster, so as to provide guidance on how to choose cluster nodes of the appropriate specifications. All other procedures are the same as those for other regular search clusters. For details, see OpenSearch Cluster Planning Suggestions.

Memory Planning

Before creating an OpenSearch vector cluster, properly plan the cluster's memory capacity based on the data size, vector dimensions, and index types.

Creating a Cluster

The procedure for creating a vector cluster is the same as that for creating any other regular search cluster. For details, see Creating an Elasticsearch Cluster.

Pay attention to the following key parameters:

(Optional) Configuring the Circuit Breaker

To mitigate out-of-memory (OOM) errors and maintain optimal vector query performance, a circuit breaker mechanism is employed. When the cluster's off-heap memory usage exceeds a predefined threshold, this mechanism automatically blocks vector data writes to the cluster. The purposes of this mechanism are as follows:
  • Preventing memory overload: Write throttling lowers off-heap memory usage.
  • Maintaining query performance: Optimal vector query performance can be maintained by preventing memory overload.

The off-heap memory circuit breaker is enabled by default. You can enable or disable it and adjust its threshold based on service requirements. The command is as follows:

PUT _cluster/settings
{
  "persistent": {
    "native.cache.circuit_breaker.enabled": "true",
    "native.cache.circuit_breaker.cpu.limit": "80%"
  }
}
Table 2 Parameter description

Parameter

Type

Description

native.cache.circuit_breaker.enabled

Boolean

Whether to enable the off-heap memory circuit breaker.

Value range:
  • true (default value): Enable the off-heap memory circuit breaker. When the off-heap memory usage reaches the circuit breaker threshold, write requests are blocked.
  • false: Disable the off-heap memory circuit breaker. OOM errors may occur in case of excessive off-heap memory usage.

native.cache.circuit_breaker.cpu.limit

String

Circuit breaker threshold in terms of maximum off-heap memory usage.

This parameter is available only when native.cache.circuit_breaker.enabled=true.

Value range: a value in percentage

Default value: 80%

Assume a cluster uses 128 GB memory. The required heap memory is 31 GB, and the default circuit breaker threshold is 80%, then: (128 – 31) x 80% = 77.6 GB. This means when the off-heap memory usage exceeds 77.6 GB, the circuit breaker is triggered to block write operations.