forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: zhengxiu <zhengxiu@huawei.com> Co-committed-by: zhengxiu <zhengxiu@huawei.com>
213 lines
7.9 KiB
HTML
213 lines
7.9 KiB
HTML
<a name="EN-US_TOPIC_0000002152432486"></a><a name="EN-US_TOPIC_0000002152432486"></a>
|
|
|
|
<h1 class="topictitle1">Using Nested Fields for Vector Search</h1>
|
|
<div id="body0000002152432486"><p id="EN-US_TOPIC_0000002152432486__p8060118">Nested fields allow multiple vectorized records to be stored in a single document. For example, in an RAG scenario, documents usually need to be segmented by paragraph or by a fixed length, and then vectorized into multiple semantic vectors. By means of nested fields, these vectors can be written into a same Elasticsearch document. For a document that contains multiple vector records, if the query vector matches any of them, the document is returned.</p>
|
|
<div class="section" id="EN-US_TOPIC_0000002152432486__section7402052171913"><h4 class="sectiontitle">Constraints</h4><p id="EN-US_TOPIC_0000002152432486__p242820500615">Only Elasticsearch 7.10.2 clusters support this feature.</p>
|
|
</div>
|
|
<div class="section" id="EN-US_TOPIC_0000002152432486__section920834941814"><h4 class="sectiontitle">Creating a Vector Index</h4><p id="EN-US_TOPIC_0000002152432486__p1814521513234">Create a vector index with nested fields. The index contains an <strong id="EN-US_TOPIC_0000002152432486__b1147823312334">id</strong> field whose type is <strong id="EN-US_TOPIC_0000002152432486__b11727144215333">keyword</strong>, and an <strong id="EN-US_TOPIC_0000002152432486__b10774185012335">embedding</strong> field whose type is <strong id="EN-US_TOPIC_0000002152432486__b386116013417">nested</strong>. The embedding field contains two subfields: <strong id="EN-US_TOPIC_0000002152432486__b922417201355">chunk</strong> and <strong id="EN-US_TOPIC_0000002152432486__b14480924123517">emb</strong>. The <strong id="EN-US_TOPIC_0000002152432486__b4416182815350">chunk</strong> subfield is of the <strong id="EN-US_TOPIC_0000002152432486__b038012119351">keyword</strong> type, and the <strong id="EN-US_TOPIC_0000002152432486__b5241534193514">emb</strong> subfield is of the <strong id="EN-US_TOPIC_0000002152432486__b1496212398351">vector</strong> type.</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002152432486__screen15862126144714">PUT my_index
|
|
{
|
|
"settings": {
|
|
"index.vector": true
|
|
},
|
|
"mappings": {
|
|
"properties": {
|
|
"id": {
|
|
"type": "keyword"
|
|
},
|
|
"embedding": {
|
|
"type": "nested",
|
|
"properties": {
|
|
"chunk": {
|
|
"type": "keyword"
|
|
},
|
|
"emb": {
|
|
"type": "vector",
|
|
"dimension": 2,
|
|
"indexing": true,
|
|
"algorithm": "GRAPH",
|
|
"metric": "euclidean"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}</pre>
|
|
</div>
|
|
<div class="section" id="EN-US_TOPIC_0000002152432486__section1069103718276"><h4 class="sectiontitle">Importing Vector Data</h4><p id="EN-US_TOPIC_0000002152432486__p107491342132810">Use the bulk operation to write data in arrays. Each document contains two vector records.</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002152432486__screen20150127125014">POST my_index/_bulk
|
|
{"index":{}}
|
|
{"id": 1, "embedding": [{"chunk":1,"emb": [1, 1]}, {"chunk":2,"emb": [2, 2]}]}
|
|
{"index":{}}
|
|
{"id": 2, "embedding": [{"chunk":1,"emb": [2, 2]}, {"chunk":2,"emb": [3, 3]}]}
|
|
{"index":{}}
|
|
{"id": 3, "embedding": [{"chunk":1,"emb": [3, 3]}, {"chunk":2,"emb": [4, 4]}]}</pre>
|
|
</div>
|
|
<div class="section" id="EN-US_TOPIC_0000002152432486__section20812133113112"><h4 class="sectiontitle">Vector Search</h4><p id="EN-US_TOPIC_0000002152432486__p124414255492">The nested query is required for nested fields. To perform such a query, you need to set the path parameter to specify the nested path, and set <strong id="EN-US_TOPIC_0000002152432486__b101091310103910">score_mode</strong> to <strong id="EN-US_TOPIC_0000002152432486__b14692812103918">max</strong>, indicating the maximum similarity between all vectors in the document and the query vector.</p>
|
|
<ul id="EN-US_TOPIC_0000002152432486__ul4274936174213"><li id="EN-US_TOPIC_0000002152432486__li62744361425">Standard query<p id="EN-US_TOPIC_0000002152432486__p1690601511614"><a name="EN-US_TOPIC_0000002152432486__li62744361425"></a><a name="li62744361425"></a>Query the top 10 documents that are most similar to vector [1, 1].</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002152432486__screen1279552916507">GET my_index/_search
|
|
{
|
|
"_source": {"excludes": ["embedding"]},
|
|
"query": {
|
|
"nested": {
|
|
"path": "embedding",
|
|
"score_mode": "max",
|
|
"query": {
|
|
"vector": {
|
|
"embedding.emb": {
|
|
"vector": [1, 1],
|
|
"topk": 10
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}</pre>
|
|
<p id="EN-US_TOPIC_0000002152432486__p19738125816532">An example of the query result:</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002152432486__screen103475417404">{
|
|
"took" : 2,
|
|
"timed_out" : false,
|
|
"_shards" : {
|
|
"total" : 1,
|
|
"successful" : 1,
|
|
"skipped" : 0,
|
|
"failed" : 0
|
|
},
|
|
"hits" : {
|
|
"total" : {
|
|
"value" : 3,
|
|
"relation" : "eq"
|
|
},
|
|
"max_score" : 1.0,
|
|
"hits" : [
|
|
{
|
|
"_index" : "my_index",
|
|
"_type" : "_doc",
|
|
"_id" : "Hc4Vc5QBSxCnghau22AE",
|
|
"_score" : 1.0,
|
|
"_source" : {
|
|
"id" : 1
|
|
}
|
|
},
|
|
{
|
|
"_index" : "my_index",
|
|
"_type" : "_doc",
|
|
"_id" : "Hs4Vc5QBSxCnghau22AE",
|
|
"_score" : 0.33333334,
|
|
"_source" : {
|
|
"id" : 2
|
|
}
|
|
},
|
|
{
|
|
"_index" : "my_index",
|
|
"_type" : "_doc",
|
|
"_id" : "H84Vc5QBSxCnghau22AE",
|
|
"_score" : 0.11111111,
|
|
"_source" : {
|
|
"id" : 3
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}</pre>
|
|
</li><li id="EN-US_TOPIC_0000002152432486__li081194014213">Pre-filtering query<p id="EN-US_TOPIC_0000002152432486__p6281329155415"><a name="EN-US_TOPIC_0000002152432486__li081194014213"></a><a name="li081194014213"></a>First retrieve documents whose ID is ["2", "3"], and then return the top 10 documents that are most similar to the query vector [1, 1].</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002152432486__screen1560120439117">GET my_index/_search
|
|
{
|
|
"query": {
|
|
"nested": {
|
|
"path": "embedding",
|
|
"score_mode": "max",
|
|
"query": {
|
|
"vector": {
|
|
"embedding.emb": {
|
|
"vector": [1, 1],
|
|
"topk": 10,
|
|
"filter": {
|
|
"terms": {"id": ["2", "3"]}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}</pre>
|
|
<p id="EN-US_TOPIC_0000002152432486__p188900299012">An example of the query result:</p>
|
|
<pre class="screen" id="EN-US_TOPIC_0000002152432486__screen1352602315214">{
|
|
"took" : 3,
|
|
"timed_out" : false,
|
|
"_shards" : {
|
|
"total" : 1,
|
|
"successful" : 1,
|
|
"skipped" : 0,
|
|
"failed" : 0
|
|
},
|
|
"hits" : {
|
|
"total" : {
|
|
"value" : 2,
|
|
"relation" : "eq"
|
|
},
|
|
"max_score" : 0.33333334,
|
|
"hits" : [
|
|
{
|
|
"_index" : "my_index",
|
|
"_type" : "_doc",
|
|
"_id" : "3t0ZypcB-Tff59gMTZO2",
|
|
"_score" : 0.33333334,
|
|
"_source" : {
|
|
"id" : 2,
|
|
"embedding" : [
|
|
{
|
|
"chunk" : 1,
|
|
"emb" : [
|
|
2,
|
|
2
|
|
]
|
|
},
|
|
{
|
|
"chunk" : 2,
|
|
"emb" : [
|
|
3,
|
|
3
|
|
]
|
|
}
|
|
]
|
|
}
|
|
},
|
|
{
|
|
"_index" : "my_index",
|
|
"_type" : "_doc",
|
|
"_id" : "390ZypcB-Tff59gMTZO2",
|
|
"_score" : 0.11111111,
|
|
"_source" : {
|
|
"id" : 3,
|
|
"embedding" : [
|
|
{
|
|
"chunk" : 1,
|
|
"emb" : [
|
|
3,
|
|
3
|
|
]
|
|
},
|
|
{
|
|
"chunk" : 2,
|
|
"emb" : [
|
|
4,
|
|
4
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}</pre>
|
|
</li></ul>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="css_01_0117.html">Configuring Vector Search for Elasticsearch Clusters</a></div>
|
|
</div>
|
|
</div>
|
|
|