
Building a RAG System with HERE AND NOW AI
Welcome to this hands-on RAG System Workshop, where you’ll learn to build a RAG (Retrieval-Augmented Generation) system using open-source LLMs and Gradio. By the end of this workshop, you’ll have a fully functional RAG system that can retrieve knowledge from PDFs, websites, and use vector-based search to enhance responses.
Real-world Applications:
Chatbots that answer based on company documents
AI tutors pulling data from textbooks
Context-aware virtual assistants
π Setting Up the Environment for the RAG System Workshop
1οΈβ£ Prerequisites
Python (>=3.8)
Pip
VS Code or any IDE
2οΈβ£ Clone the Repository & Install Dependencies
# Clone the projectgit clone https://github.com/hereandnowai/rag-workshop.git
cd rag-workshop
# Create virtual environment
python -m venv rag_env
# Activate virtual environment
# On Windows
.\rag_env\Scripts\activate
# On Mac/Linux
source rag_env/bin/activate
# Install dependencies
pip install -r requirements.txt
Requirements:
π RAG System Architecture
The system consists of the following layers:
LLM Communicator: Connects with the language model (e.g., LLaMA or OpenAI).
RAG Modules:
Raw Text RAG: Uses full PDFs.
Vector-based RAG: Uses embeddings and cosine similarity.
Web-based RAG: Scrapes content from websites.
Gradio UI: A simple chatbot interface.
π‘ Breaking Down the Code
π 1. LLM Communicator (llm_communicator.py)
This script is designed for a RAG System Workshop, demonstrating how to interact with Large Language Models (LLMs) using an OpenAI-compatible API. It allows seamless communication with both open-source (e.g., Llama3) and closed-source models, whether hosted locally or on a server.
#import the OpenAI - A Common Standard API
#to connect with ANY LLM including OpenSource and ClosedSource
from openai import OpenAI
#the Client Declartion and Initialisation
client = OpenAI(
# API link for LLM where it is hosted
base_url='http://localhost:11434/v1',
#The API Key for Authentication
api_key='ollama',
)
# The Communicator or the chat Completion function - For Chatting
"""
Input -> message dictionary containing system promt, user question, and assisstant answer
Output -> A list of streaming data from llm
"""
def Get_StreamedResponse(messages):
return client.chat.completions.create(
model="llama3.2:latest", temperature=0,messages=messages,stream=True)
"""
Model -> the models name in string
temprature -> float range 0 - 1, 0 for same result and 0.1 - 1 for varied result
messages -> dictionary of chat history
stream -> bool true to ask the llm to stream data
"""
#Non streamed datad
def Get_NonStreamedResponse(messages):
return client.chat.completions.create(messages, temperature=0,messages=messages)
#==============================================================================
#testing the Get_completion
""" Testmessages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hi Iam Balaji."},
]
content = Get_StreamedResponse(Testmessages)
for chunk in content:
print(chunk.choices[0].delta.content) """
# Function to get embeddings from OpenAI's API
def Get_embeddings(text):
response = client.embeddings.create(
input=text,
model="nomic-embed-text"
)
embedding = response.data[0].embedding
return embedding
#testing embeddings
#print(get_embeddings("test"))
- Setting Up the Connection
The script initializes an API client with a base URL pointing to the LLMβs location and an API key for authentication (in this case, βollamaβ for local models). - Chat Completion (Generating Responses)
Two functions handle text-based interactions with the LLM:
Streaming Response: Receives responses in real-time, making interactions feel more natural.
Non-Streaming Response: Waits for the full response before returning it.
Both methods allow customization using temperature settings, where a lower value produces more consistent answers, and a higher value introduces variability.
- Generating Text Embeddings
The script also includes functionality to convert text into numerical vector representations (embeddings). These embeddings enhance semantic search, similarity comparison, and retrieval-augmented generation (RAG) systems. - Testing the Functions
The script contains test cases for interacting with the LLM and retrieving embeddings. These tests help verify that the RAG System Workshop setup is functioning correctly.
Summary
β Showcases LLM integration in a RAG Workshop.
β Enables both real-time interaction and batch response processing.
β Utilizes embeddings for enhanced AI functionalities.
β Compatible with both local and cloud-based AI models.
π¬ 2. Memory-Enabled Chatbot (chatbot.py)
This script is designed to create a chatbot with memory, allowing it to retain previous interactions for more contextual and coherent conversations. It integrates with an LLM (Large Language Model) through a custom communicator module (llm_communicator).
#Hereandnow AI
#Chatbot - Chat with memory
# import our llm communicator
from llm_communicator import Get_StreamedResponse,Get_NonStreamedResponse
#Function to chat with llm
def chat_with_LLM(message, history):
messages = [{"role": "system", "content": "Your name is Caramel AI, You are an AI Teacher working for 'HERE AND NOW AI'."}]
# Add history messages
for h in history:
messages.append({"role": "user", "content": h[0]})
if h[1]: # Only add assistant message if it exists
messages.append({"role": "assistant", "content": h[1]})
# Add current message
messages.append({"role": "user", "content": message})
#call get_completion and get response
completion = Get_StreamedResponse(messages)
# Stream the response
response = "" # Initialize empty string for streaming response
for chunk in completion:
if chunk.choices[0].delta.content is not None:
content = chunk.choices[0].delta.content
response += content
yield response
"""
Test for non streaming
Content
print(Get_NonStreamedResponse()).choices[0].message.content
"""
"""
Model -> the models name in string
temprature -> float range 0 - 1, 0 for same result and 0.1 - 1 for varied result
messages -> dictionary of chat history
stream -> bool true to ask the llm to stream data
"""
#testing
# Convert history to messages format
""" Testmessages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hi Iam Balaji."},
{"role": "assistant", "content": "Hi Nice to meet you Balaji."},
{"role": "user", "content": "what is my Balaji."}
]
content = Get_StreamedResponse(Testmessages)
for chunk in content:
print(chunk.choices[0].delta.content) """
- Importing the Communicator
The chatbot imports two functions:
Get_StreamedResponse β Retrieves the AIβs response in real-time, chunk by chunk.
Get_NonStreamedResponse β Retrieves the AIβs response all at once (not used actively in the main function).
- Defining the Chat Function
The function chat_with_LLM(message, history) is responsible for interacting with the AI model while maintaining a conversation history.
How it Works:
System Message Setup:
The chatbot is given a persona:
“Caramel AI, an AI Teacher working for HERE AND NOW AI.”
This system message ensures the chatbot maintains a consistent identity.
Adding Chat History:
The script loops through past messages (history), adding user inputs and AI responses to maintain context.
This allows the chatbot to remember previous conversations, making it feel more natural and intelligent.
Appending the Userβs Current Message:
The new user message is added to the conversation history before querying the AI.
Generating a Response:
The function calls Get_StreamedResponse(messages), requesting a streamed response from the LLM.
The response is streamed in chunks, allowing for a real-time chat experience.
Streaming the Response:
The chatbot iterates through each chunk of data received from the AI.
If there is valid content, it is appended to response and yielded in real-time, making the chatbot feel dynamic.
- Test Cases (Commented Out)
A non-streaming test case is provided, showing how to retrieve a response without streaming.
A sample conversation test case demonstrates how the chatbot remembers past messages and provides responses accordingly.
Key Features of This Chatbot
β Maintains conversation memory for more intelligent interactions.
β Streams responses in real-time for a smoother chat experience.
β Uses a predefined AI persona (Caramel AI, an AI Teacher at HERE AND NOW AI).
β Flexible communication with an LLM via an external API module.
This setup ensures that the chatbot can hold meaningful, context-aware conversations, making it ideal for an AI-powered teaching assistant. π
3. Creating UI using Gradio (app.py)
This Python script creates a web-based chatbot interface using Gradio, a simple UI framework for machine learning applications. The chatbot uses vector-based retrieval-augmented generation (RAG) to provide intelligent responses.
#HERE AND NOW AI
#UI Interface for Chat
#import gradio for UI creation
import gradio as gr
#from chatbot import chat_with_LLM
#from rag_rawtext import chat_with_rawtext
#from rag_web import chat_with_web
from rag_vectortext import chat_with_vectortext
# Create Gradio interface with Chatbot
with gr.Blocks(title="HERE AND NOW AI") as demo:
# Add a logo at the top
logo_path = "https://hereandnowai.com/wp-content/uploads/2025/02/2-removebg-preview-1-250x28.png"
gr.Image(logo_path, elem_id="logo", show_label=False, height=100, width=600)
#the interface gr will show
chatbot = gr.Chatbot(label="What Can I Help With ")
# The user Input Box
msg = gr.Textbox(placeholder="Ask Anything ...")
with gr.Row():
clear = gr.Button("Clear")
submit_button = gr.Button(value ="Submit",variant= "primary")
# user function to handle user input and history
def user(user_message, history):
return "", history + [[user_message, None]]
# bot function to generate a response based on the conversation history
def bot(history):
history[-1][1] = ""
for chunk in chat_with_vectortext(history[-1][0], history[:-1]):
history[-1][1] = chunk
yield history
# When either pressing Enter in the textbox or clicking the submit button, it triggers the conversation
msg.submit(user, [msg, chatbot], [msg, chatbot] , queue=False).then(bot,chatbot,chatbot)
submit_button.click(user, [msg, chatbot], [msg, chatbot] , queue=False).then(bot,chatbot,chatbot)
# clear the screen
clear.click(lambda: None, None, chatbot, queue=False)
# launch the Gradio app
if __name__ == "__main__":
demo.launch(favicon_path="images/favicon.ico")
This Python script builds an interactive chatbot UI using Gradio, a framework that allows for the creation of web-based machine learning applications. The chatbot is designed to interact with users and provide responses using a Retrieval-Augmented Generation (RAG) system. The responses are generated based on a vector search model from the rag_vectortext
module.
1. Importing Dependencies
import gradio as gr
β Imports the Gradio library to create the chatbot interface.- Commented-out imports (
chat_with_LLM
,chat_with_rawtext
,chat_with_web
) β These indicate that other response methods were considered but are not used. from rag_vectortext import chat_with_vectortext
β Imports the function responsible for generating AI-based responses using vectorized text search.
2. Setting Up the Chatbot UI
A Gradio Blocks UI is created with various components:
Adding a Logo
- A logo is displayed at the top using
gr.Image()
.
Chatbot Display
gr.Chatbot(label="What Can I Help With ")
initializes the chatbot interface where the conversation appears.
User Input Box
gr.Textbox(placeholder="Ask Anything ...")
allows users to enter their messages.
Action Buttons
- Submit Button (
submit_button
) β Sends the userβs message to the chatbot. - Clear Button (
clear
) β Resets the conversation history.
3. Defining Chatbot Functions
- User Function (
user
)- Takes the userβs input and appends it to the conversation history.
- Returns an empty string (to clear the input box) and updates the chat.
- Bot Function (
bot
)- Calls
chat_with_vectortext()
to process the latest user query. - Streams the response back to the chatbot UI.
- Calls
4. Connecting User Input to Chatbot Response
msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(bot, chatbot, chatbot)
- Submits the user’s message, updates the chat history, and processes a response.
submit_button.click(user, [msg, chatbot], [msg, chatbot], queue=False).then(bot, chatbot, chatbot)
- Clicking the Submit button also triggers the same flow as above.
- Clearing Chat
- Clicking Clear resets the chatbot interface.
5. Launching the Chat Interface
demo.launch(favicon_path="images/favicon.ico")
starts the chatbot and makes it accessible via a web-based interface.
Summary
β Builds a chatbot UI using Gradio
β Uses a RAG-based vector search model for responses
β Includes a chat display, input box, and action buttons
β Streams responses dynamically
β Can be accessed as a web application
π 4. RAG with Raw Text (rag_rawtext.py)
This Python script builds a PDF-based AI chatbot that allows users to ask questions about a specific PDF document. It uses an LLM (Large Language Model) communicator to generate responses based on the extracted text from the PDF. The chatbot works by reading the document, retrieving relevant information, and generating answers dynamically.
# import our llm communicator
from llm_communicator import Get_StreamedResponse
import PyPDF2
#the path of the file we are going to ask question
pdf_path = "temp/HereandNow_AI.pdf"
#pdf Extractor - Pdf to Text converter
# Function to extract text from PDF
def extract_pdf_text(pdf_path):
# open the pdf file , read all pages using
# for loop and extract text from the pdf and return the text
with open(pdf_path, 'rb') as file:
reader = PyPDF2.PdfReader(file)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
#Test PDF to text converter
#print(extract_pdf_text(pdf_path))
#Function to chat with llm
def chat_with_rawtext(message, history):
# Initialize empty string for streaming response
response = ""
# Convert system prompt to messages format
# This tell the llm what role it has to play like
# What how to process the input and what output format It has to reply
messages = [
{"role": "system", "content": "You are a helpful assistant that helps answer questions based on PDF content."}
]
# Add history messages
for h in history:
messages.append({"role": "user", "content": h[0]})
if h[1]: # Only add assistant message if it exists
messages.append({"role": "assistant", "content": h[1]})
####
pdf_messages = messages
# Extract text from the PDF
pdf_extract = extract_pdf_text(pdf_path)
prompt = f"Context: {pdf_extract}\n\nQuestion: {message}\nAnswer:"
# Add current pdf content to message
pdf_messages.append({"role": "user", "content": f"{prompt}"})
# Add current message in UI
messages.append({"role": "user", "content": message})
#call get_completion and get response
completion = Get_StreamedResponse(pdf_messages)
# Stream the response
for chunk in completion:
if chunk.choices[0].delta.content is not None:
content = chunk.choices[0].delta.content
response += content
yield response
- Importing Dependencies
from llm_communicator import Get_StreamedResponse β Imports the function responsible for generating responses from the LLM.
import PyPDF2 β Imports PyPDF2, a library used for extracting text from PDF documents.
- Defining the PDF Path
pdf_path = “temp/HereandNow_AI.pdf” β Specifies the location of the PDF file from which content will be extracted.
- Function to Extract Text from a PDF (extract_pdf_text)
This function reads the entire PDF and converts its content into text.
Opens the PDF file in binary mode (‘rb’).
Uses PyPDF2.PdfReader() to read the file.
Iterates through each page and extracts text.
Returns the extracted text as a string.
β This is crucial because LLMs cannot read PDFs directlyβthey require text input.
- Chatbot Function (chat_with_rawtext)
This function processes user messages, retrieves relevant content from the PDF, and generates AI-based responses.
Step 1: Initializing Conversation Context
Creates a system prompt β The chatbot is instructed to behave as an AI assistant that answers questions based on the PDF.
Maintains chat history β Past interactions are stored to provide a more contextualized response.
Step 2: Extracting PDF Content
Calls extract_pdf_text(pdf_path) to get the full text from the PDF.
Formats the prompt to include extracted content and the user’s query:
Context: (Extracted PDF Text)
Question: (User's Question)
Answer:
This ensures the LLM has access to the documentβs information when generating an answer.
Step 3: Passing Data to the LLM
- Calls
Get_StreamedResponse(pdf_messages)
, which sends the formatted messages to the LLM. - Streams the response dynamically by iterating through the chunks of generated text and yielding them as output.
β This allows for a real-time response instead of waiting for the entire answer to be generated.
Key Features of This Chatbot
β
Reads and understands PDF documents.
β
Remembers previous chat history for better responses.
β
Uses streaming output for faster response delivery.
β
Provides accurate answers by structuring input in a meaningful way.
This setup is ideal for applications where users need AI-powered assistance in understanding or summarizing documents, reports, research papers, or manuals. π
π§ 5. RAG with Vector Search (rag_vectortext.py)
This script is designed to create an AI-powered chatbot that can answer questions based on the content of a PDF document. It does this by extracting text from the PDF, dividing it into smaller chunks, finding the most relevant chunk for a userβs query using vector-based search, and generating a response using a Large Language Model (LLM).
# import our llm communicator
from llm_communicator import Get_StreamedResponse, Get_embeddings
import PyPDF2
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
#the path of the file we are going to ask question
pdf_path = "temp/HereandNow_AI.pdf"
#pdf Extractor - Pdf to Text converter
# Function to extract text from PDF
def extract_pdf_text(pdf_path):
# open the pdf file , read all pages using
# for loop and extract text from the pdf and return the text
with open(pdf_path, 'rb') as file:
reader = PyPDF2.PdfReader(file)
text = ""
for page in reader.pages:
text += page.extract_text()
return text
#Test PDF to text converter
#print(extract_pdf_text(pdf_path))
# Function to chunk text into smaller pieces
def chunk_text(text, chunk_size=500):
# Split the text into smaller chunks that are around `chunk_size` characters long
chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
return chunks
# Function to find the most relevant chunk using cosine similarity
def find_most_relevant_chunk(question, chunks):
question_embedding = Get_embeddings(question)
chunk_embeddings = [Get_embeddings(chunk) for chunk in chunks]
# Compute cosine similarity between the question embedding and chunk embeddings
similarities = [cosine_similarity([question_embedding], [chunk_embedding])[0][0] for chunk_embedding in chunk_embeddings]
# Find the chunk with the highest similarity score
most_relevant_chunk_index = np.argmax(similarities)
return chunks[most_relevant_chunk_index]
#Function to chat with llm
def chat_with_vectortext(message, history):
# Initialize empty string for streaming response
response = ""
# Convert system prompt to messages format
# This tell the llm what role it has to play like
# What how to process the input and what output format It has to reply
messages = [
{"role": "system", "content": "You are a helpful assistant that helps answer questions based on PDF content."}
]
# Add history messages
for h in history:
messages.append({"role": "user", "content": h[0]})
if h[1]: # Only add assistant message if it exists
messages.append({"role": "assistant", "content": h[1]})
pdf_messages = messages
# Step 1: Extract text from the PDF
pdf_extract = extract_pdf_text(pdf_path)
# Step 2: Chunk the PDF text into smaller pieces
chunks = chunk_text(pdf_extract)
# Step 3: Find the most relevant chunk based on the question
relevant_chunk = find_most_relevant_chunk(message, chunks)
#print(relevant_chunk)
prompt = f"Context: {relevant_chunk}\n\nQuestion: {message}\nAnswer:"
# Add current pdf content to message
pdf_messages.append({"role": "user", "content": f"{prompt}"})
# Add current message in UI
messages.append({"role": "user", "content": message})
#call get_completion and get response
completion = Get_StreamedResponse(pdf_messages)
# Stream the response
for chunk in completion:
if chunk.choices[0].delta.content is not None:
content = chunk.choices[0].delta.content
response += content
yield response
1. Importing Necessary Libraries
The script imports:
PyPDF2
β for extracting text from PDF files.numpy
β for handling numerical computations.sklearn.metrics.pairwise.cosine_similarity
β for calculating similarity between text embeddings.llm_communicator
β which containsGet_StreamedResponse
(for getting responses from the AI model) andGet_embeddings
(for generating vector representations of text).
2. Extracting Text from the PDF
The function extract_pdf_text(pdf_path)
reads the PDF file, extracts the text from each page, and returns the complete text as a string. This is necessary because LLMs cannot process PDFs directly; they require plain text input.
3. Chunking the Extracted Text
Since PDFs can contain large amounts of text, the function chunk_text(text, chunk_size=500)
splits the extracted text into smaller pieces of around 500 characters each. This ensures that responses remain relevant and manageable when searching for specific information.
4. Finding the Most Relevant Chunk
When a user asks a question, the function find_most_relevant_chunk(question, chunks)
identifies the best-matching text segment using cosine similarity:
- It converts the userβs question into a numerical representation (embedding) using
Get_embeddings(question)
. - It does the same for each chunk of text extracted from the PDF.
- It calculates cosine similarity scores between the question and each chunk, selecting the most relevant chunk.
- The chunk with the highest similarity score is then used as the context for answering the userβs question.
5. Chatbot Interaction Process
The chat_with_vectortext(message, history)
function powers the chatbotβs responses. It follows these steps:
- Sets up the system prompt β The chatbot is instructed to answer questions based on PDF content.
- Maintains conversation history β Previous messages from the user and chatbot are stored to provide contextual responses.
- Extracts and chunks text from the PDF β The text is processed into smaller parts for better searchability.
- Finds the most relevant chunk β Using vector-based search, the chatbot selects the most relevant information to answer the query.
- Constructs the final prompt β The retrieved chunk is formatted as follows:
"Context: (Relevant PDF Text) \n\n Question: (Userβs Question) \n Answer:"
- Generates the response β The chatbot streams its answer in real-time using
Get_StreamedResponse(pdf_messages)
, ensuring faster and more interactive replies.
6. Streaming AI-Generated Responses
The chatbot streams responses chunk by chunk, ensuring a smooth user experience instead of waiting for the entire response at once. The assistant dynamically updates the answer as the AI generates text, making it feel more natural and real-time.
Summary
This script implements Retrieval-Augmented Generation (RAG) by combining:
β
PDF text extraction (to process documents)
β
Text chunking (to break content into searchable segments)
β
Vector search using embeddings & cosine similarity (to find the most relevant text)
β
AI-powered response generation (to provide accurate answers in real-time)
By leveraging vector-based search and LLM-generated responses, this chatbot enhances accuracy and relevance when answering user queries based on PDF content. π
π 6. Web Scraping + LLM Chatbot Integration π(rag_web.py)
This script is designed to create an AI-powered chatbot that can answer user queries based on content extracted from a website. It achieves this by scraping text from a given URL, formatting it into a prompt, and generating responses using a Large Language Model (LLM).
# import our llm communicator
from llm_communicator import Get_StreamedResponse
# import website communicator
from bs4 import BeautifulSoup
# to get the website content
import requests
#url we are going to extract
url = 'https://hereandnowai.com/contact/'
def scrape_website(url):
# Send an HTTP request to the website
response = requests.get(url)
# If request is successful, parse the content
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
# Extract relevant text from the webpage (for simplicity, we get all text in <p> tags)
paragraphs = soup.find_all('p')
text_content = ' '.join([para.get_text() for para in paragraphs])
return text_content
else:
return "Failed to retrieve content from the website."
#Test Website content
#print(scrape_website())
#Function to chat with llm
def chat_with_web(message, history):
# Initialize empty string for streaming response
response = ""
# Convert system prompt to messages format
# This tell the llm what role it has to play like
# What how to process the input and what output format It has to reply
messages = [
{"role": "system", "content": "You are a helpful assistant that helps answer questions based on web content."}
]
# Add history messages
for h in history:
messages.append({"role": "user", "content": h[0]})
if h[1]: # Only add assistant message if it exists
messages.append({"role": "assistant", "content": h[1]})
web_messages = messages
# Extract text from the PDF
web_extract = scrape_website(url)
prompt = f"Context: {web_extract}\n\nQuestion: {message}\nAnswer:"
# Add current pdf content to message
web_messages.append({"role": "user", "content": f"{prompt}"})
# Add current message in UI screen
messages.append({"role": "user", "content": message})
#call get_completion and get response
completion = Get_StreamedResponse(web_messages)
# Stream the response
for chunk in completion:
if chunk.choices[0].delta.content is not None:
content = chunk.choices[0].delta.content
response += content
yield response
1. Importing Necessary Libraries
The script imports:
requests
β to send HTTP requests and retrieve webpage content.BeautifulSoup
(frombs4
) β to parse and extract text from HTML elements.llm_communicator
β which containsGet_StreamedResponse
(for getting responses from the AI model).
2. Scraping the Website Content
The function scrape_website(url)
extracts text from the specified webpage:
- It sends an HTTP request using
requests.get(url)
. - If the request is successful (
status_code == 200
), it parses the HTML content usingBeautifulSoup
. - It extracts all text inside
<p>
tags (paragraphs) and combines them into a single string. - If the request fails, it returns
"Failed to retrieve content from the website."
.
β‘ This function allows the chatbot to fetch the latest website content dynamically instead of relying on pre-stored data.
3. Chatbot Interaction Process
The function chat_with_web(message, history)
handles the chat process:
- Sets up the system prompt β The chatbot is instructed to answer questions based on web content.
- Maintains conversation history β Previous user inputs and AI responses are stored for context.
- Extracts website content β Calls
scrape_website(url)
to retrieve relevant text from the webpage. - Formats the prompt β The retrieved content is structured as:
"Context: (Extracted Web Content) \n\n Question: (Userβs Question) \n Answer:"
- Sends the prompt to the LLM β The chatbot generates a response using
Get_StreamedResponse(web_messages)
. - Streams the AI-generated response β Instead of waiting for the full response, the assistant dynamically updates the answer in real time, improving user experience.
4. Streaming AI-Generated Responses
- The chatbot streams responses chunk by chunk.
- If the model detects a partial response, it appends new words dynamically rather than waiting for the entire answer.
- This ensures faster, interactive responses.
Final Summary
This chatbot uses Retrieval-Augmented Generation (RAG) by combining:
β
Web scraping (to dynamically fetch website content)
β
LLM-powered responses (to generate human-like answers)
β
Streaming output (to provide real-time responses)
By extracting live website data, the chatbot can answer questions based on the latest information from the website, making it more relevant, up-to-date, and context-aware. π
π Running the Full RAG System
Activate virtual environment:
Activate virtual environment:
- Activate virtual environment:
On Windows
.\rag_env\Scripts\activate
On Mac/Linux
On Windows
.\rag_env\Scripts\activate
On Mac/Linux
source rag_env/bin/activate
Run Gradio App:
python app.py
Access the app in your browser:
http://localhost:7860/
You now have a fully functional RAG System! π
π― Next Steps
Enhance the RAG system using larger vector databases.
Integrate with LangChain for more complex workflows.
Add authentication and deploy the system online.
π‘ Final Thoughts
Youβve successfully built an end-to-end RAG System capable of pulling information from PDFs, websites, and using vector search β all wrapped in a neat Gradio UI.
Stay tuned for more AI workshops from HERE AND NOW AI! ππ