
How to Write the User Input Query Embedding Function
How to Write the User Input Query Embedding Function 관련
In order to retrieve information from a vector database, you must first embed your query text. The database compares your query's embedding with its stored embeddings to find and retrieve the most relevant document.
It's important to ensure that the dimensions (rows × columns) of your query embedding match those of the database embeddings. This alignment is achieved by using the same embedding model to generate your query.
Matching embeddings involves calculating the similarity (for example, cosine similarity) between the query and stored embeddings, identifying the closest match for effective retrieval.
Let’s write a function that allows us to embed a query which then queries similar documents using the generated embeddings. Wrapping it in a function makes it reusable.
#sentence embeddings function and query
question <- function(sentence){
sentence_embeddings <- textEmbed(sentence,
layers = 10:11,
aggregation_from_layers_to_tokens = "concatenate",
aggregation_from_tokens_to_texts = "mean",
keep_token_embeddings = FALSE
)
# convert tibble to vector
sentence_vec_embeddings <- unlist(sentence_embeddings, use.names = FALSE)
sentence_vec_embeddings <- list(sentence_vec_embeddings)
# Query similar documents using embeddings
results <- query(
client,
"recipes_collection",
query_embeddings = sentence_vec_embeddings ,
n_results = 2
)
results
}
This chunk of code is similar to how we have previously used the text_embed()
function. The query()
function is added to enable querying the vector database, particularly the recipes' collection, and returns the top two documents that closely match a user’s query.
Our function thus takes in a sentence as an argument and embeds the sentence to generate sentence embeddings. It then queries the database and returns two documents that match the query most.