RAG & Knowledge Integration
Master Retrieval-Augmented Generation and advanced knowledge integration patterns in AutoGen. Build intelligent agents that leverage external knowledge sources and memory systems.
RAG & Knowledge Integration Overview
Building intelligent agents with external knowledge access
What You'll Learn
- Core Memory protocol and its implementation patterns
- Vector database integration with ChromaDB and Redis
- LlamaIndex integration for advanced RAG capabilities
- Document indexing and chunking strategies
- Production-ready RAG system architecture
Why RAG Matters
- Access to external knowledge beyond training data
- Dynamic information retrieval from documents and databases
- Contextual memory for personalized interactions
- Scalable knowledge management across conversations
- Integration with existing enterprise knowledge bases
RAG Process Flow
The two distinct phases of Retrieval-Augmented Generation
- Load documents from various sources
- Chunk documents into manageable pieces
- Generate embeddings for each chunk
- Store in vector database for retrieval
- Process user query and generate embedding
- Search vector database for relevant chunks
- Retrieve top-k most similar documents
- Augment prompt with retrieved context
AutoGen Memory Protocol
AutoGen provides a standardized Memory protocol that serves as the foundation for all RAG implementations in the framework.
Core Memory Methods
- add(): Store new information in memory
- query(): Retrieve relevant information
- update_context(): Augment agent's model context
- clear(): Remove all memory entries
- close(): Clean up resources
- dump_component(): Serialize for persistence
Standard Memory Protocol Flow
Memory Types
- ListMemory: Simple chronological storage
- ChromaDBVectorMemory: Vector similarity search
- RedisMemory: Redis-based vector storage
- Mem0Memory: Cloud/local hybrid memory
Basic Memory Implementation
This simple example shows how to add memory to an AutoGen agent in just a few steps. Start with ListMemory for learning, then upgrade to vector databases for production.
Simple Start - ListMemory
# Step 1: Import required modules
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.memory import ListMemory
# Step 2: Create simple memory
memory = ListMemory()
# Step 3: Create agent with memory
agent = AssistantAgent(
name="MemoryBot",
model_client=your_model_client,
memory=[memory],
description="I remember our conversations!"
)
# Step 4: Add some facts to memory
from autogen_agentchat.memory import MemoryContent, MemoryMimeType
await memory.add(MemoryContent(
content="User prefers morning meetings",
mime_type=MemoryMimeType.TEXT
))
await memory.add(MemoryContent(
content="User is learning AutoGen framework",
mime_type=MemoryMimeType.TEXT
))
# Step 5: Chat with memory-enabled agent
result = await agent.run("What do you know about me?")
Advanced - Vector Memory
# For production: Use ChromaDB for semantic search
from autogen_agentchat.memory import ChromaDBVectorMemory
# Simple ChromaDB setup
vector_memory = ChromaDBVectorMemory(
collection_name="my_knowledge"
)
# Create agent with vector memory
smart_agent = AssistantAgent(
name="SmartBot",
model_client=your_model_client,
memory=[vector_memory],
description="I can find similar information!"
)
# Add knowledge that can be searched semantically
await vector_memory.add(MemoryContent(
content="Python is a programming language",
mime_type=MemoryMimeType.TEXT,
metadata={"topic": "programming"}
))
await vector_memory.add(MemoryContent(
content="JavaScript runs in web browsers",
mime_type=MemoryMimeType.TEXT,
metadata={"topic": "programming"}
))
# Agent automatically finds relevant info
result = await smart_agent.run("Tell me about coding languages")
Key Difference:
Vector Database Integration
ChromaDB
Open-source vector database with embedding functions and persistence support.
Redis Vector
High-performance in-memory database with vector search capabilities.
Mem0 Memory
Cloud-native memory service with both cloud and local backend support.
Document Indexing System
This example shows how to upload a PDF file to a vector database and then use it with an AI agent. We'll split this into two parts: uploading the document and using it in conversations.
Part 1: Upload PDF to Vector Database
Step 0: Prepare Your PDF
Before running this demo, you'll need a PDF file:
- Download your LinkedIn profile as PDF (or use any PDF file)
- Create a folder named
documentsin your project directory - Save the PDF as
Profile.pdfin the documents folder
# Step 1: Import required libraries
from autogen_ext.memory.chromadb import ChromaDBVectorMemory, PersistentChromaDBVectorMemoryConfig
from autogen_core.memory import MemoryContent, MemoryMimeType
import PyPDF2 # For reading PDF files
import asyncio
# Step 2: Create vector database connection
vector_memory = ChromaDBVectorMemory(
config=PersistentChromaDBVectorMemoryConfig(
collection_name="my_documents" # Simple collection name
)
)
# Step 3: Read PDF file
def read_pdf(file_path: str) -> str:
"""Read text content from a PDF file"""
text = ""
with open(file_path, 'rb') as file:
pdf_reader = PyPDF2.PdfReader(file)
# Extract text from all pages
for page in pdf_reader.pages:
text += page.extract_text() + "\n"
return text
# Step 4: Split text into chunks
def create_chunks(text: str, chunk_size: int = 1000) -> list:
"""Split text into smaller chunks for better search"""
chunks = []
# Split text into chunks of specified size
for i in range(0, len(text), chunk_size):
chunk = text[i:i + chunk_size].strip()
if chunk: # Only add non-empty chunks
chunks.append(chunk)
return chunks
# Step 5: Upload PDF to vector database
async def upload_pdf_to_vector_db(pdf_path: str):
"""Complete process to upload PDF to vector database"""
# Read the PDF file
print(f"Reading PDF: {pdf_path}")
pdf_text = read_pdf(pdf_path)
# Split into manageable chunks
print("Creating text chunks...")
chunks = create_chunks(pdf_text, chunk_size=1000)
# Upload each chunk to vector database
print(f"Uploading {len(chunks)} chunks to vector database...")
for i, chunk in enumerate(chunks):
# Create memory content with metadata
memory_content = MemoryContent(
content=chunk,
mime_type=MemoryMimeType.TEXT,
metadata={
"source": pdf_path,
"chunk_number": i + 1,
"total_chunks": len(chunks)
}
)
# Add to vector database
await vector_memory.add(memory_content)
print(f"✅ Successfully uploaded {len(chunks)} chunks from {pdf_path}")
# Step 6: Run the upload
if __name__ == "__main__":
asyncio.run(upload_pdf_to_vector_db("./documents/Profile.pdf"))
Part 2: Use PDF Knowledge in AI Agent
# Step 1: Create AI agent with vector memory
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.memory.chromadb import ChromaDBVectorMemory, PersistentChromaDBVectorMemoryConfig
from autogen_ext.models.openai import OpenAIChatCompletionClient
import asyncio
# Connect to the same vector database from Part 1
vector_memory = ChromaDBVectorMemory(
config=PersistentChromaDBVectorMemoryConfig(
collection_name="my_documents" # Simple collection name
)
)
# Step 2: Create agent with access to PDF knowledge
pdf_agent = AssistantAgent(
name="PDFExpert",
model_client=OpenAIChatCompletionClient(model="gpt-4o"), # Your OpenAI/other model client
memory=[vector_memory], # Connect the vector database
description="""You are an AI assistant with access to uploaded PDF documents.
When users ask questions, search through the PDF content to provide accurate answers.
Always mention which document section your information comes from."""
)
# Step 3: Query the PDF content through the agent
async def ask_pdf_question(question: str):
"""Ask a question about the uploaded PDF content"""
print(f"🤔 Question: {question}")
# The agent will automatically:
# 1. Search the vector database for relevant PDF chunks
# 2. Use that information to answer the question
# 3. Provide a response based on the PDF content
result = await pdf_agent.run(task=question)
return result
# Step 4: Interactive questioning
async def interactive_pdf_chat():
"""Start an interactive chat with your PDF content"""
print("🚀 Start chatting with your PDF! (type 'quit' to exit)")
while True:
user_question = input("\n❓ Your question: ")
if user_question.lower() == 'quit':
print("👋 Goodbye!")
break
# Get answer from PDF content
result = await pdf_agent.run(task=user_question)
# Extract the final answer from the last message
final_answer = result.messages[-1].content.replace(" TERMINATE", "")
print(f"🤖 PDF Agent: {final_answer}")
# Start interactive chat
if __name__ == "__main__":
asyncio.run(interactive_pdf_chat())
How It Works:
- Read PDF file using PyPDF2
- Split text into 1000-character chunks
- Convert chunks to vector embeddings
- Store in ChromaDB vector database
- Create agent with vector memory access
- Ask questions in natural language
- Agent searches PDF chunks automatically
- Returns answers based on PDF content
RAG & Knowledge Integration Wrap-up
Let's recap what you've learned
Memory Protocol
Learned how to add, query, and manage memory using AutoGen's standardized protocol with ListMemory and vector databases.
Vector Databases
Mastered ChromaDB, Redis, and Mem0 integration for semantic search and intelligent information retrieval.
Document Indexing
Practiced PDF upload workflows, text chunking strategies, and building chat interfaces over documents.
RAG Agents
Built AI agents that leverage external knowledge, maintain context, and provide informed responses with citations.
What's Next?
You now have the foundation to build sophisticated RAG systems. Here are recommended next steps:
- Practice: Build a RAG system with your own documents
- Explore: Experiment with different vector databases and embedding models
- Scale: Learn about production patterns, chunking strategies, and performance optimization
- Integrate: Combine RAG with tools, multi-agent systems, and advanced frameworks like LlamaIndex
Pro Tip
Start with simple ListMemory for learning, then upgrade to vector databases like ChromaDB for production. Always test with small datasets before scaling up!