forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: zhengxiu <zhengxiu@huawei.com> Co-committed-by: zhengxiu <zhengxiu@huawei.com>
213 lines
7.9 KiB
HTML
213 lines
7.9 KiB
HTML
<a name="EN-US_TOPIC_0000002426355865"></a><a name="EN-US_TOPIC_0000002426355865"></a>
|
|
|
|
<h1 class="topictitle1">Using Nested Fields for Vector Search</h1>
|
|
<div id="body0000002426355865"><p id="EN-US_TOPIC_0000002426355865__p8060118">Nested fields allow multiple vectorized records to be stored in a single document. For example, in an RAG scenario, documents usually need to be segmented by paragraph or by a fixed length, and then vectorized into multiple semantic vectors. By means of nested fields, these vectors can be written into a same Elasticsearch document. For a document that contains multiple vector records, if the query vector matches any of them, the document is returned.</p>
|
|
<div class="section" id="EN-US_TOPIC_0000002426355865__section7402052171913"><h4 class="sectiontitle">Constraints</h4><p id="EN-US_TOPIC_0000002426355865__p242820500615">Only OpenSearch 2.19.0 clusters support this feature.</p>
|
|
</div>
|
|
<div class="section" id="EN-US_TOPIC_0000002426355865__section920834941814"><h4 class="sectiontitle">Creating a Vector Index</h4><p id="EN-US_TOPIC_0000002426355865__p1814521513234">Create a vector index with nested fields. The index contains an <strong id="EN-US_TOPIC_0000002426355865__b115769801511331">id</strong> field whose type is <strong id="EN-US_TOPIC_0000002426355865__b93818467211331">keyword</strong>, and an <strong id="EN-US_TOPIC_0000002426355865__b101765592911331">embedding</strong> field whose type is <strong id="EN-US_TOPIC_0000002426355865__b139046583211331">nested</strong>. The embedding field contains two subfields: <strong id="EN-US_TOPIC_0000002426355865__b151113421711331">chunk</strong> and <strong id="EN-US_TOPIC_0000002426355865__b183726057411331">emb</strong>. The <strong id="EN-US_TOPIC_0000002426355865__b195353201111331">chunk</strong> subfield is of the <strong id="EN-US_TOPIC_0000002426355865__b100986176111331">keyword</strong> type, and the <strong id="EN-US_TOPIC_0000002426355865__b198454408211331">emb</strong> subfield is of the <strong id="EN-US_TOPIC_0000002426355865__b15600183411331">vector</strong> type.</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen15862126144714">PUT my_index
|
|
{
|
|
"settings": {
|
|
"index.vector": true
|
|
},
|
|
"mappings": {
|
|
"properties": {
|
|
"id": {
|
|
"type": "keyword"
|
|
},
|
|
"embedding": {
|
|
"type": "nested",
|
|
"properties": {
|
|
"chunk": {
|
|
"type": "keyword"
|
|
},
|
|
"emb": {
|
|
"type": "vector",
|
|
"dimension": 2,
|
|
"indexing": true,
|
|
"algorithm": "GRAPH",
|
|
"metric": "euclidean"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}</pre>
|
|
</div>
|
|
<div class="section" id="EN-US_TOPIC_0000002426355865__section1069103718276"><h4 class="sectiontitle">Importing Vector Data</h4><p id="EN-US_TOPIC_0000002426355865__p107491342132810">Use the bulk operation to write data in arrays. Each document contains two vector records.</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen20150127125014">POST my_index/_bulk
|
|
{"index":{}}
|
|
{"id": 1, "embedding": [{"chunk":1,"emb": [1, 1]}, {"chunk":2,"emb": [2, 2]}]}
|
|
{"index":{}}
|
|
{"id": 2, "embedding": [{"chunk":1,"emb": [2, 2]}, {"chunk":2,"emb": [3, 3]}]}
|
|
{"index":{}}
|
|
{"id": 3, "embedding": [{"chunk":1,"emb": [3, 3]}, {"chunk":2,"emb": [4, 4]}]}</pre>
|
|
</div>
|
|
<div class="section" id="EN-US_TOPIC_0000002426355865__section20812133113112"><h4 class="sectiontitle">Vector Search</h4><p id="EN-US_TOPIC_0000002426355865__p124414255492">The nested query is required for nested fields. To perform such a query, you need to set the path parameter to specify the nested path, and set <strong id="EN-US_TOPIC_0000002426355865__b133142972611331">score_mode</strong> to <strong id="EN-US_TOPIC_0000002426355865__b79558898111331">max</strong>, indicating the maximum similarity between all vectors in the document and the query vector.</p>
|
|
<ul id="EN-US_TOPIC_0000002426355865__ul4274936174213"><li id="EN-US_TOPIC_0000002426355865__li62744361425">Standard query<p id="EN-US_TOPIC_0000002426355865__p1690601511614"><a name="EN-US_TOPIC_0000002426355865__li62744361425"></a><a name="li62744361425"></a>Query the top 10 documents that are most similar to vector [1, 1].</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen1279552916507">GET my_index/_search
|
|
{
|
|
"_source": {"excludes": ["embedding"]},
|
|
"query": {
|
|
"nested": {
|
|
"path": "embedding",
|
|
"score_mode": "max",
|
|
"query": {
|
|
"vector": {
|
|
"embedding.emb": {
|
|
"vector": [1, 1],
|
|
"topk": 10
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}</pre>
|
|
<p id="EN-US_TOPIC_0000002426355865__p19738125816532">An example of the query result:</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen103475417404">{
|
|
"took" : 2,
|
|
"timed_out" : false,
|
|
"_shards" : {
|
|
"total" : 1,
|
|
"successful" : 1,
|
|
"skipped" : 0,
|
|
"failed" : 0
|
|
},
|
|
"hits" : {
|
|
"total" : {
|
|
"value" : 3,
|
|
"relation" : "eq"
|
|
},
|
|
"max_score" : 1.0,
|
|
"hits" : [
|
|
{
|
|
"_index" : "my_index",
|
|
"_type" : "_doc",
|
|
"_id" : "Hc4Vc5QBSxCnghau22AE",
|
|
"_score" : 1.0,
|
|
"_source" : {
|
|
"id" : 1
|
|
}
|
|
},
|
|
{
|
|
"_index" : "my_index",
|
|
"_type" : "_doc",
|
|
"_id" : "Hs4Vc5QBSxCnghau22AE",
|
|
"_score" : 0.33333334,
|
|
"_source" : {
|
|
"id" : 2
|
|
}
|
|
},
|
|
{
|
|
"_index" : "my_index",
|
|
"_type" : "_doc",
|
|
"_id" : "H84Vc5QBSxCnghau22AE",
|
|
"_score" : 0.11111111,
|
|
"_source" : {
|
|
"id" : 3
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}</pre>
|
|
</li><li id="EN-US_TOPIC_0000002426355865__li081194014213">Pre-filtering query<p id="EN-US_TOPIC_0000002426355865__p6281329155415"><a name="EN-US_TOPIC_0000002426355865__li081194014213"></a><a name="li081194014213"></a>First retrieve documents whose ID is ["2", "3"], and then return the top 10 documents that are most similar to the query vector [1, 1].</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen1560120439117">GET my_index/_search
|
|
{
|
|
"query": {
|
|
"nested": {
|
|
"path": "embedding",
|
|
"score_mode": "max",
|
|
"query": {
|
|
"vector": {
|
|
"embedding.emb": {
|
|
"vector": [1, 1],
|
|
"topk": 10,
|
|
"filter": {
|
|
"terms": {"id": ["2", "3"]}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}</pre>
|
|
<p id="EN-US_TOPIC_0000002426355865__p188900299012">An example of the query result:</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002426355865__screen1352602315214">{
|
|
"took" : 3,
|
|
"timed_out" : false,
|
|
"_shards" : {
|
|
"total" : 1,
|
|
"successful" : 1,
|
|
"skipped" : 0,
|
|
"failed" : 0
|
|
},
|
|
"hits" : {
|
|
"total" : {
|
|
"value" : 2,
|
|
"relation" : "eq"
|
|
},
|
|
"max_score" : 0.33333334,
|
|
"hits" : [
|
|
{
|
|
"_index" : "my_index",
|
|
"_type" : "_doc",
|
|
"_id" : "3t0ZypcB-Tff59gMTZO2",
|
|
"_score" : 0.33333334,
|
|
"_source" : {
|
|
"id" : 2,
|
|
"embedding" : [
|
|
{
|
|
"chunk" : 1,
|
|
"emb" : [
|
|
2,
|
|
2
|
|
]
|
|
},
|
|
{
|
|
"chunk" : 2,
|
|
"emb" : [
|
|
3,
|
|
3
|
|
]
|
|
}
|
|
]
|
|
}
|
|
},
|
|
{
|
|
"_index" : "my_index",
|
|
"_type" : "_doc",
|
|
"_id" : "390ZypcB-Tff59gMTZO2",
|
|
"_score" : 0.11111111,
|
|
"_source" : {
|
|
"id" : 3,
|
|
"embedding" : [
|
|
{
|
|
"chunk" : 1,
|
|
"emb" : [
|
|
3,
|
|
3
|
|
]
|
|
},
|
|
{
|
|
"chunk" : 2,
|
|
"emb" : [
|
|
4,
|
|
4
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}</pre>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="css_01_0101.html">Configuring Vector Search for OpenSearch Clusters</a></div>
|
|
</div>
|
|
</div>
|
|
|