embed_databricks()
gets embeddings for text using a model hosted in a
Databricks workspace. It relies on the ellmer package for managing
Databricks credentials. See ellmer::chat_databricks
for more on
supported modes of authentication.
Usage
embed_databricks(
x,
workspace = databricks_workspace(),
model = "databricks-bge-large-en",
batch_size = 512L
)
Arguments
- x
x can be:
A character vector, in which case a matrix of embeddings is returned.
A data frame with a column named
text
, in which case the dataframe is returned with an additional column namedembedding
.Missing or
NULL
, in which case a function is returned that can be called to get embeddings. This is a convenient way to partial in additional arguments likemodel
, and is the most convenient way to produce a function that can be passed to theembed
argument ofragnar_store_create()
.
- workspace
The URL of a Databricks workspace, e.g.
"https://example.cloud.databricks.com"
. Will use the value of the environment variableDATABRICKS_HOST
, if set.- model
The name of a text embedding model.
- batch_size
split
x
into batches when embedding. Integer, limit of strings to include in a single request.