Day 1: Learning of RAG

Famous and Trending words in every discussion is all about the AI space only. so we will  see about the RAG in the below blog.

RAG - Retrieval Augment Generation

LLM - Large language model.

SLM - Small Language model.


We will start with LLM 

LLM predicts the next word to your conversation.  try gpt-oss-12b LLM - which means 12 billion of parameters are combined and they came up with the option to find the next word in a sentence.

How we can predict the next word. we have different options. main key basic is line equation.

If you find the nearest line, then same way we can predicts from the relevance of the data.

LLM will give response only from the trained data.

If llm is trained with DOG and CAT. but user is asking for the bear. If LLM is trained with bear information then it will give response else it will say like I dont know like how humans will do.


Basic equation for line distance calculation.

Y=MX+C

x= [1,2,3,4,5]

Y= [2,4,6,8,10]

m = slope 

c=Intercept

Based on these equation, we can predict the data in the model. ex: Billion mentioned in LLM name refers to the prediction of words can be calculated.

Weights in LLM - predicts the next wrod.

Hello _____ then programmer will automatically respond like Hello World is the next word. same way the normal person will say Hello <NAME> with some other word.

Temperature - 0-10

0 - give accurate information 

10 - gives random words.  Creative on its own.

In chat GPT - it will run with SSE protocol.  thinks for sometime and gives the answer based on the query.

It gives a response with random words as well. LLM give a creative response. Imaginary or hallucination itself and gives response.

Finally LLM in simple terms, predicts the next word.  


Now we will go back to RAG and its use.

WAR between two country is happening across the world now.  but model is trained before a year which will not have details about the WAR. then we need to connect with the real time system and get the current data as well. that is the concept of RAG.


LLM + REAL time DATA (User input data) = RAG


All models are not good in realtime. whoever  properly trained the data and finetuned those are giving a good response.

Try out the smallest.ai website they will have a small lesser size LLM which will give a good response, while predicting the words.


Another services to be aware of 

few people get an user input and share the information to other models.  once the response is received from all the models. then finally the main user will collate and filter the data from it. its been used for business purpose nowadays.

Few people using this flow as well.

User Input -> Main model -> SubModel1 + subModel2 + subModel3 -> collate info -> Main model -> response to user. 

Next Topic

LLM + Document => RAG

how we can achive that?

Document will be stored in the Vector DB. we will split the data into chunks and store in Vector DB.

Based on magnitude and pointer we call it as a Vector.


From one pointer to another pointer -> it calculates the distance in vector DB and retrieve the information. 

who define the relation between the data like Doctor and apple are related.

LLM give a point location in Vector DB and identifies the closet point and response the relation.

why we need Vector DB

Ex: Spotify /Amazon  websites -> they are giving suggestions right, those suggestions are derived from algorithm ANN - Approximate nearest neighbour.  suggestion will be given like the similar concept Vector DB is also works in a similar way.


LLM - how we can prefer which model is good one.

If the trained data is from Health care information then it will be used mainly for health care info.

if the trained data is from IT sector then you will get response related to development way.

Ex: in Health care trained data , it will expect HELLO <patient Name> where as IT sector  programmers will say like HELLO WORLD.


context between two words are mainly used during chunks.


thanks to Syed for explaining all this.

Comments