Files
doc-exports/docs/css/umn/css_01_0112.html
zhengxiu 93d856d5c5 css umn 25.6.0 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: zhengxiu <zhengxiu@huawei.com>
Co-committed-by: zhengxiu <zhengxiu@huawei.com>
2025-11-25 11:34:43 +00:00

213 lines
7.9 KiB
HTML

<a name="EN-US_TOPIC_0000002426355865"></a><a name="EN-US_TOPIC_0000002426355865"></a>
<h1 class="topictitle1">Using Nested Fields for Vector Search</h1>
<div id="body0000002426355865"><p id="EN-US_TOPIC_0000002426355865__p8060118">Nested fields allow multiple vectorized records to be stored in a single document. For example, in an RAG scenario, documents usually need to be segmented by paragraph or by a fixed length, and then vectorized into multiple semantic vectors. By means of nested fields, these vectors can be written into a same Elasticsearch document. For a document that contains multiple vector records, if the query vector matches any of them, the document is returned.</p>
<div class="section" id="EN-US_TOPIC_0000002426355865__section7402052171913"><h4 class="sectiontitle">Constraints</h4><p id="EN-US_TOPIC_0000002426355865__p242820500615">Only OpenSearch 2.19.0 clusters support this feature.</p>
</div>
<div class="section" id="EN-US_TOPIC_0000002426355865__section920834941814"><h4 class="sectiontitle">Creating a Vector Index</h4><p id="EN-US_TOPIC_0000002426355865__p1814521513234">Create a vector index with nested fields. The index contains an <strong id="EN-US_TOPIC_0000002426355865__b115769801511331">id</strong> field whose type is <strong id="EN-US_TOPIC_0000002426355865__b93818467211331">keyword</strong>, and an <strong id="EN-US_TOPIC_0000002426355865__b101765592911331">embedding</strong> field whose type is <strong id="EN-US_TOPIC_0000002426355865__b139046583211331">nested</strong>. The embedding field contains two subfields: <strong id="EN-US_TOPIC_0000002426355865__b151113421711331">chunk</strong> and <strong id="EN-US_TOPIC_0000002426355865__b183726057411331">emb</strong>. The <strong id="EN-US_TOPIC_0000002426355865__b195353201111331">chunk</strong> subfield is of the <strong id="EN-US_TOPIC_0000002426355865__b100986176111331">keyword</strong> type, and the <strong id="EN-US_TOPIC_0000002426355865__b198454408211331">emb</strong> subfield is of the <strong id="EN-US_TOPIC_0000002426355865__b15600183411331">vector</strong> type.</p>
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen15862126144714">PUT my_index
{
"settings": {
"index.vector": true
},
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"embedding": {
"type": "nested",
"properties": {
"chunk": {
"type": "keyword"
},
"emb": {
"type": "vector",
"dimension": 2,
"indexing": true,
"algorithm": "GRAPH",
"metric": "euclidean"
}
}
}
}
}
}</pre>
</div>
<div class="section" id="EN-US_TOPIC_0000002426355865__section1069103718276"><h4 class="sectiontitle">Importing Vector Data</h4><p id="EN-US_TOPIC_0000002426355865__p107491342132810">Use the bulk operation to write data in arrays. Each document contains two vector records.</p>
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen20150127125014">POST my_index/_bulk
{"index":{}}
{"id": 1, "embedding": [{"chunk":1,"emb": [1, 1]}, {"chunk":2,"emb": [2, 2]}]}
{"index":{}}
{"id": 2, "embedding": [{"chunk":1,"emb": [2, 2]}, {"chunk":2,"emb": [3, 3]}]}
{"index":{}}
{"id": 3, "embedding": [{"chunk":1,"emb": [3, 3]}, {"chunk":2,"emb": [4, 4]}]}</pre>
</div>
<div class="section" id="EN-US_TOPIC_0000002426355865__section20812133113112"><h4 class="sectiontitle">Vector Search</h4><p id="EN-US_TOPIC_0000002426355865__p124414255492">The nested query is required for nested fields. To perform such a query, you need to set the path parameter to specify the nested path, and set <strong id="EN-US_TOPIC_0000002426355865__b133142972611331">score_mode</strong> to <strong id="EN-US_TOPIC_0000002426355865__b79558898111331">max</strong>, indicating the maximum similarity between all vectors in the document and the query vector.</p>
<ul id="EN-US_TOPIC_0000002426355865__ul4274936174213"><li id="EN-US_TOPIC_0000002426355865__li62744361425">Standard query<p id="EN-US_TOPIC_0000002426355865__p1690601511614"><a name="EN-US_TOPIC_0000002426355865__li62744361425"></a><a name="li62744361425"></a>Query the top 10 documents that are most similar to vector [1, 1].</p>
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen1279552916507">GET my_index/_search
{
"_source": {"excludes": ["embedding"]},
"query": {
"nested": {
"path": "embedding",
"score_mode": "max",
"query": {
"vector": {
"embedding.emb": {
"vector": [1, 1],
"topk": 10
}
}
}
}
}
}</pre>
<p id="EN-US_TOPIC_0000002426355865__p19738125816532">An example of the query result:</p>
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen103475417404">{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "Hc4Vc5QBSxCnghau22AE",
"_score" : 1.0,
"_source" : {
"id" : 1
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "Hs4Vc5QBSxCnghau22AE",
"_score" : 0.33333334,
"_source" : {
"id" : 2
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "H84Vc5QBSxCnghau22AE",
"_score" : 0.11111111,
"_source" : {
"id" : 3
}
}
]
}
}</pre>
</li><li id="EN-US_TOPIC_0000002426355865__li081194014213">Pre-filtering query<p id="EN-US_TOPIC_0000002426355865__p6281329155415"><a name="EN-US_TOPIC_0000002426355865__li081194014213"></a><a name="li081194014213"></a>First retrieve documents whose ID is ["2", "3"], and then return the top 10 documents that are most similar to the query vector [1, 1].</p>
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen1560120439117">GET my_index/_search
{
"query": {
"nested": {
"path": "embedding",
"score_mode": "max",
"query": {
"vector": {
"embedding.emb": {
"vector": [1, 1],
"topk": 10,
"filter": {
"terms": {"id": ["2", "3"]}
}
}
}
}
}
}
}</pre>
<p id="EN-US_TOPIC_0000002426355865__p188900299012">An example of the query result:</p>
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen1352602315214">{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.33333334,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "3t0ZypcB-Tff59gMTZO2",
"_score" : 0.33333334,
"_source" : {
"id" : 2,
"embedding" : [
{
"chunk" : 1,
"emb" : [
2,
2
]
},
{
"chunk" : 2,
"emb" : [
3,
3
]
}
]
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "390ZypcB-Tff59gMTZO2",
"_score" : 0.11111111,
"_source" : {
"id" : 3,
"embedding" : [
{
"chunk" : 1,
"emb" : [
3,
3
]
},
{
"chunk" : 2,
"emb" : [
4,
4
]
}
]
}
}
]
}
}</pre>
</li></ul>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="css_01_0101.html">Configuring Vector Search for OpenSearch Clusters</a></div>
</div>
</div>