
How to Create Chunks
April 15, 2025About 2 min
How to Create Chunks 관련
How to Build a Local RAG App with Ollama and ChromaDB in the R Programming Language
A Large Language Model (LLM) is a type of machine learning model that is trained to understand and generate human-like text. These models are trained on vast datasets to capture the nuances of human language, enabling them to generate coherent and co...
How to Build a Local RAG App with Ollama and ChromaDB in the R Programming Language
A Large Language Model (LLM) is a type of machine learning model that is trained to understand and generate human-like text. These models are trained on vast datasets to capture the nuances of human language, enabling them to generate coherent and co...
To understand why chunking is important before embedding, you need to understand what an embedding is.
An embedding is a vectoral representation of a word or a sentence. Machines don’t understand human text – they understand numbers. LLMs work by transforming human text to numerical representations in order to give answers. The process of generating embeddings requires a lot of computation, and breaking down the data to be embedded optimizes the embedding process.
So now we’re going to split the dataframe into smaller chunks of a specified size to enable efficient batch processing and iteration.
# Define the size of each chunk (number of rows per chunk)
chunk_size <- 1
# Get the total number of rows in the dataframe
n <- nrow(cleaned_recipes_df)
# Create a vector of group numbers for chunking
# Each group number repeats for 'chunk_size' rows
# Ensure the vector matches the total number of rows
r <- rep(1:ceiling(n/chunk_size), each = chunk_size)[1:n]
# Split the dataframe into smaller chunks (subsets) based on the group numbers
chunks <- split(cleaned_recipes_df, r)