Embedding - Part II - Dense Embedding in RAG

Embedding concepts.

We are having Dense embedding and Sparse embedding concepts.

We will see the Dense Embedding concept today.

Dense - Semantic/ similar search concept. Ex: Nomic embed text.

Its like a continuous values. ex; [0.44,0.66, 0.234, 0.15, 0.152.., ] which will come after 0.__,   this values is called as Dense.

above we will have n dimension based on the model. We will have 0 in it, it very rare. LLM model Transformer encoded part are all used. 

LLM - all mini LLM. [it also contains transformer but only encoding alone is also present]

Transformer - Basic transformer, Sentence Transformer, [encoding and decoding]

hugging face and ollama - you can get the models detail and use it. 

Sparse embedding : One Hot encoding - number of occurrences it will check and respond. 


LLM - Embedding LLMs 

         -  Normal LLM models (prompt -> chunk -> embedding value.)

For RAG = cosine, eculidean, dot model - these mathematical formulas are used to search for the similarity search. 


Dense embedding - Similarty search - Vector  occurs in Multidimension space -This concept is called latent space. 

if points are closer in vector DB then that is returned as similarity search. 


Always use the same embedding model for query and store. 


Based on the usecase, we will go with the below.

Sparse embedding - Keyword search

Dense embedding - similarity search

Comments