Computes a similarity measure between the query and the documents embeddings and uses this similarity to rank the documents.
Usage
ragnar_retrieve_vss(
store,
text,
top_k = 3L,
method = c("cosine_distance", "cosine_similarity", "euclidean_distance", "dot_product",
"negative_dot_product")
)
Arguments
- store
A
RagnarStore
object or adplyr::tbl()
derived from it. When you pass atbl
, you may use usual dplyr verbs (e.g.filter()
,slice()
) to restrict the rows examined before similarity scoring. Avoid dropping essential columns such astext
,embedding
,origin
, andhash
.- text
A string to find the nearest match too
- top_k
Integer, maximum amount of document chunks to retrieve
- method
A string specifying the method used to compute the similarity between the query and the document chunks embeddings store in the database.
Value
A dataframe of retrieved chunks. Each row corresponds to an
individual chunk in the store. It always contains a column named text
that contains the chunks.
Details
The supported methods are:
cosine_distance: Measures the dissimilarity between two vectors based on the cosine of the angle between them. Defined as \(1 - cos(\theta)\), where \(cos(\theta)\) is the cosine similarity.
cosine_similarity: Measures the similarity between two vectors based on the cosine of the angle between them. Ranges from -1 (opposite) to 1 (identical), with 0 indicating orthogonality.
euclidean_distance: Computes the straight-line (L2) distance between two points in a multidimensional space. Defined as \(\sqrt{\sum(x_i - y_i)^2}\).
dot_product: Computes the sum of the element-wise products of two vectors.
negative_dot_product: The negation of the dot product.
Pre-filtering with dplyr
The store behaves like a lazy table backed by DuckDB, so row‑wise filtering is executed directly in the database. This lets you narrow the search space efficiently without pulling data into R.
See also
Other ragnar_retrieve:
ragnar_retrieve()
,
ragnar_retrieve_bm25()
,
ragnar_retrieve_vss_and_bm25()
Examples
if (FALSE) { # (rlang::is_installed("dbplyr") && nzchar(Sys.getenv("OPENAI_API_KEY")))
# Basic usage
store <- ragnar_store_create(
embed = \(x) ragnar::embed_openai(x, model = "text-embedding-3-small")
)
ragnar_store_insert(store, data.frame(text = c("foo", "bar")))
ragnar_store_build_index(store)
ragnar_retrieve(store, "foo")
# More Advanced: store metadata, retrieve with pre-filtering
store <- ragnar_store_create(
embed = \(x) ragnar::embed_openai(x, model = "text-embedding-3-small"),
extra_cols = data.frame(category = character())
)
ragnar_store_insert(
store,
data.frame(
category = "desert",
text = c("ice cream", "cake", "cookies")
)
)
ragnar_store_insert(
store,
data.frame(
category = "meal",
text = c("steak", "potatoes", "salad")
)
)
ragnar_store_build_index(store)
# simple retrieve
ragnar_retrieve(store, "carbs")
# retrieve with pre-filtering
dplyr::tbl(store) |>
dplyr::filter(category == "meal") |>
ragnar_retrieve("carbs")
}