PV_GRAPH deeply optimizes the HNSW algorithm and supports the vector and scalar joint filtering. When the vector and scalar joint filtering is used, the result filling rate and query performance can be greatly improved compared with post-filtering and Boolean query.
An Elasticsearch cluster of version 7.10.2 has been created by referring to Cluster Planning for Vector Retrieval.
Create an index named my_index that contains a vector field my_vector and two sub-fields country and category.
PUT my_index { "settings": { "index": { "vector": true } }, "mappings": { "properties": { "my_vector": { "type": "vector", "dimension": 2, "indexing": true, "algorithm": "PV_GRAPH", "metric": "euclidean", "sub_fields": ["country", "category"] } } } }
For details about the parameters for creating an index, see Table 1.
The metric parameter of the PV_GRAPH index algorithm can only be set to euclidean or inner_product.
When algorithm is set to PV_GRPAH and sub_fields is specified, the following data writing grammars are supported. The sub_fields parameter supports only the keyword type and you can specify multiple values for it.
# Write a single data record. POST my_index/_doc { "my_vector": { "data": [1.0, 1.0], "country": "cn", "category": ["1", "2"] } } # Write multiple data records in batches. POST my_index/_bulk {"index": {}} {"my_vector": {"data": [1.0, 2.0], "country": "cn", "category": "1"}} {"index": {}} {"my_vector": {"data": [2.0, 2.0], "country": "cn", "category": ["1", "2"]}} {"index": {}} {"my_vector": {"data": [2.0, 3.0], "country": "eu", "category": "2"}}
Based on the existing Elasticsearch APIs, the filter parameter is added to vector to support vector and scalar joint filtering. The values of sub_fields can be used for scalar filtering. Currently, the JSON format is supported. The should, must, must_not, term, and terms queries are supported. The syntax is the same as that of Elasticsearch query. The restrictions are as follows:
Currently, up to four layers are supported for filtering nesting.
The fields defined in sub_fields during index creation are the scalar fields used in the joint filtering and take effect only when the algorithm is set to PV_GRAPH. If the specified filtering field does not exist, the filtering request becomes invalid and the query is processed with no filtering conditions.
# Example of single-label and single-value matching query GET my_index/_search { "query": { "vector": { "my_vector": { "vector": [1.0, 1.0], "topk": 10, "filter": { "term": { "country": "cn" } } } } } } # Example of single-label and multi-value matching query GET my_index/_search { "query": { "vector": { "my_vector": { "vector": [1.0, 1.0], "topk": 10, "filter": { "terms": { "country": ["cn", "eu"] } } } } } } # Example of multi-label matching query GET my_index/_search { "query": { "vector": { "my_vector": { "vector": [1.0, 1.0], "topk": 10, "filter": { "must": [ { "term": {"country": "cn"} }, { "terms": {"category": ["1", "2"]} } ] } } } } } # Example of must_not matching query GET my_index/_search { "query": { "vector": { "my_vector": { "vector": [1.0, 1.0], "topk": 10, "filter": { "must_not": [ { "term": {"country": "eu"} } ] } } } } }
For details about vector query parameters, see Table 1.