RAG & Knowledge Integration

Master Retrieval-Augmented Generation and advanced knowledge integration patterns in AutoGen. Build intelligent agents that leverage external knowledge sources and memory systems.

Memory Protocol Vector Databases LlamaIndex Knowledge Management

RAG & Knowledge Integration Overview

Building intelligent agents with external knowledge access

What You'll Learn
  • Core Memory protocol and its implementation patterns
  • Vector database integration with ChromaDB and Redis
  • LlamaIndex integration for advanced RAG capabilities
  • Document indexing and chunking strategies
  • Production-ready RAG system architecture
Why RAG Matters
  • Access to external knowledge beyond training data
  • Dynamic information retrieval from documents and databases
  • Contextual memory for personalized interactions
  • Scalable knowledge management across conversations
  • Integration with existing enterprise knowledge bases

RAG Process Flow

The two distinct phases of Retrieval-Augmented Generation

1. INDEXING PHASE
  • Load documents from various sources
  • Chunk documents into manageable pieces
  • Generate embeddings for each chunk
  • Store in vector database for retrieval
2. RETRIEVAL PHASE
  • Process user query and generate embedding
  • Search vector database for relevant chunks
  • Retrieve top-k most similar documents
  • Augment prompt with retrieved context

AutoGen Memory Protocol

AutoGen provides a standardized Memory protocol that serves as the foundation for all RAG implementations in the framework.

Core Memory Methods
  • add(): Store new information in memory
  • query(): Retrieve relevant information
  • update_context(): Augment agent's model context
  • clear(): Remove all memory entries
  • close(): Clean up resources
  • dump_component(): Serialize for persistence
Memory.add()
Memory.query()
Memory.update_context()

Standard Memory Protocol Flow
Memory Types
  • ListMemory: Simple chronological storage
  • ChromaDBVectorMemory: Vector similarity search
  • RedisMemory: Redis-based vector storage
  • Mem0Memory: Cloud/local hybrid memory

Basic Memory Implementation

This simple example shows how to add memory to an AutoGen agent in just a few steps. Start with ListMemory for learning, then upgrade to vector databases for production.

Simple Start - ListMemory
# Step 1: Import required modules
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.memory import ListMemory

# Step 2: Create simple memory
memory = ListMemory()

# Step 3: Create agent with memory
agent = AssistantAgent(
    name="MemoryBot",
    model_client=your_model_client,
    memory=[memory],
    description="I remember our conversations!"
)

# Step 4: Add some facts to memory
from autogen_agentchat.memory import MemoryContent, MemoryMimeType

await memory.add(MemoryContent(
    content="User prefers morning meetings",
    mime_type=MemoryMimeType.TEXT
))

await memory.add(MemoryContent(
    content="User is learning AutoGen framework",
    mime_type=MemoryMimeType.TEXT
))

# Step 5: Chat with memory-enabled agent
result = await agent.run("What do you know about me?")
Advanced - Vector Memory
# For production: Use ChromaDB for semantic search
from autogen_agentchat.memory import ChromaDBVectorMemory

# Simple ChromaDB setup
vector_memory = ChromaDBVectorMemory(
    collection_name="my_knowledge"
)

# Create agent with vector memory
smart_agent = AssistantAgent(
    name="SmartBot",
    model_client=your_model_client,
    memory=[vector_memory],
    description="I can find similar information!"
)

# Add knowledge that can be searched semantically
await vector_memory.add(MemoryContent(
    content="Python is a programming language",
    mime_type=MemoryMimeType.TEXT,
    metadata={"topic": "programming"}
))

await vector_memory.add(MemoryContent(
    content="JavaScript runs in web browsers",
    mime_type=MemoryMimeType.TEXT,
    metadata={"topic": "programming"}
))

# Agent automatically finds relevant info
result = await smart_agent.run("Tell me about coding languages")
Key Difference:
ListMemory: Stores everything in order, good for conversation history
Vector Memory: Finds similar content using AI, perfect for knowledge search

Vector Database Integration

ChromaDB

Open-source vector database with embedding functions and persistence support.

Features: • Local & persistent storage • Multiple embedding models • Metadata filtering
Redis Vector

High-performance in-memory database with vector search capabilities.

Features: • In-memory performance • Distributed scaling • Real-time updates
Mem0 Memory

Cloud-native memory service with both cloud and local backend support.

Features: • Cloud & local options • Managed scaling • Enterprise features

Document Indexing System

This example shows how to upload a PDF file to a vector database and then use it with an AI agent. We'll split this into two parts: uploading the document and using it in conversations.

Part 1: Upload PDF to Vector Database
Step 0: Prepare Your PDF

Before running this demo, you'll need a PDF file:

  1. Download your LinkedIn profile as PDF (or use any PDF file)
  2. Create a folder named documents in your project directory
  3. Save the PDF as Profile.pdf in the documents folder
# Step 1: Import required libraries
from autogen_ext.memory.chromadb import ChromaDBVectorMemory, PersistentChromaDBVectorMemoryConfig
from autogen_core.memory import MemoryContent, MemoryMimeType
import PyPDF2  # For reading PDF files
import asyncio
# Step 2: Create vector database connection
vector_memory = ChromaDBVectorMemory(
    config=PersistentChromaDBVectorMemoryConfig(
        collection_name="my_documents"  # Simple collection name
    )
)

# Step 3: Read PDF file
def read_pdf(file_path: str) -> str:
    """Read text content from a PDF file"""
    text = ""
    with open(file_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        # Extract text from all pages
        for page in pdf_reader.pages:
            text += page.extract_text() + "\n"
    return text

# Step 4: Split text into chunks
def create_chunks(text: str, chunk_size: int = 1000) -> list:
    """Split text into smaller chunks for better search"""
    chunks = []
    # Split text into chunks of specified size
    for i in range(0, len(text), chunk_size):
        chunk = text[i:i + chunk_size].strip()
        if chunk:  # Only add non-empty chunks
            chunks.append(chunk)
    return chunks

# Step 5: Upload PDF to vector database
async def upload_pdf_to_vector_db(pdf_path: str):
    """Complete process to upload PDF to vector database"""

    # Read the PDF file
    print(f"Reading PDF: {pdf_path}")
    pdf_text = read_pdf(pdf_path)

    # Split into manageable chunks
    print("Creating text chunks...")
    chunks = create_chunks(pdf_text, chunk_size=1000)

    # Upload each chunk to vector database
    print(f"Uploading {len(chunks)} chunks to vector database...")
    for i, chunk in enumerate(chunks):
        # Create memory content with metadata
        memory_content = MemoryContent(
            content=chunk,
            mime_type=MemoryMimeType.TEXT,
            metadata={
                "source": pdf_path,
                "chunk_number": i + 1,
                "total_chunks": len(chunks)
            }
        )
        # Add to vector database
        await vector_memory.add(memory_content)

    print(f"✅ Successfully uploaded {len(chunks)} chunks from {pdf_path}")

# Step 6: Run the upload
if __name__ == "__main__":
    asyncio.run(upload_pdf_to_vector_db("./documents/Profile.pdf"))
Part 2: Use PDF Knowledge in AI Agent
# Step 1: Create AI agent with vector memory
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.memory.chromadb import ChromaDBVectorMemory, PersistentChromaDBVectorMemoryConfig
from autogen_ext.models.openai import OpenAIChatCompletionClient
import asyncio

# Connect to the same vector database from Part 1
vector_memory = ChromaDBVectorMemory(
   config=PersistentChromaDBVectorMemoryConfig(
        collection_name="my_documents"  # Simple collection name
    )
)

# Step 2: Create agent with access to PDF knowledge
pdf_agent = AssistantAgent(
    name="PDFExpert",
    model_client=OpenAIChatCompletionClient(model="gpt-4o"),  # Your OpenAI/other model client
    memory=[vector_memory],  # Connect the vector database
    description="""You are an AI assistant with access to uploaded PDF documents.
    When users ask questions, search through the PDF content to provide accurate answers.
    Always mention which document section your information comes from."""
)

# Step 3: Query the PDF content through the agent
async def ask_pdf_question(question: str):
    """Ask a question about the uploaded PDF content"""

    print(f"🤔 Question: {question}")

    # The agent will automatically:
    # 1. Search the vector database for relevant PDF chunks
    # 2. Use that information to answer the question
    # 3. Provide a response based on the PDF content

    result = await pdf_agent.run(task=question)
    return result

# Step 4: Interactive questioning
async def interactive_pdf_chat():
    """Start an interactive chat with your PDF content"""
    print("🚀 Start chatting with your PDF! (type 'quit' to exit)")

    while True:
        user_question = input("\n❓ Your question: ")

        if user_question.lower() == 'quit':
            print("👋 Goodbye!")
            break

        # Get answer from PDF content
        result = await pdf_agent.run(task=user_question)
        # Extract the final answer from the last message
        final_answer = result.messages[-1].content.replace(" TERMINATE", "")
        print(f"🤖 PDF Agent: {final_answer}")

# Start interactive chat
if __name__ == "__main__":
     asyncio.run(interactive_pdf_chat())
How It Works:
Part 1 (Upload):
  1. Read PDF file using PyPDF2
  2. Split text into 1000-character chunks
  3. Convert chunks to vector embeddings
  4. Store in ChromaDB vector database
Part 2 (Query):
  1. Create agent with vector memory access
  2. Ask questions in natural language
  3. Agent searches PDF chunks automatically
  4. Returns answers based on PDF content

RAG & Knowledge Integration Wrap-up

Let's recap what you've learned

Memory Protocol

Learned how to add, query, and manage memory using AutoGen's standardized protocol with ListMemory and vector databases.

Vector Databases

Mastered ChromaDB, Redis, and Mem0 integration for semantic search and intelligent information retrieval.

Document Indexing

Practiced PDF upload workflows, text chunking strategies, and building chat interfaces over documents.

RAG Agents

Built AI agents that leverage external knowledge, maintain context, and provide informed responses with citations.

What's Next?

You now have the foundation to build sophisticated RAG systems. Here are recommended next steps:

  • Practice: Build a RAG system with your own documents
  • Explore: Experiment with different vector databases and embedding models
  • Scale: Learn about production patterns, chunking strategies, and performance optimization
  • Integrate: Combine RAG with tools, multi-agent systems, and advanced frameworks like LlamaIndex
Pro Tip

Start with simple ListMemory for learning, then upgrade to vector databases like ChromaDB for production. Always test with small datasets before scaling up!