Agents & Models

Master the building blocks of intelligent multi-agent systems

Learn about agent types, model integration, and best practices for production deployments

What You'll Master

  • AssistantAgent and UserProxyAgent usage
  • Tool integration and workbenches
  • Multi-modal input handling
  • Structured output and streaming
  • Model client configuration
  • OpenAI, Azure, Anthropic, and local models
  • Authentication and security
  • Production deployment patterns

Agent Types & Capabilities

AutoGen AgentChat provides preset agents, each with unique behaviors and capabilities

AssistantAgent

AI-powered agent with LLM capabilities

Key Features:
  • Powered by language models (GPT-4, Claude, etc.)
  • Can use tools and function calls
  • Generates responses autonomously
  • Supports system message configuration
  • Built-in reasoning and planning
Best For:
  • Content generation and analysis
  • Code writing and debugging
  • Research and information processing
  • Complex problem-solving tasks
Note: AssistantAgent is a "kitchen sink" agent for prototyping. For production, consider implementing custom agents.

UserProxyAgent

Human interface and execution agent

Key Features:
  • Represents human users in conversations
  • Executes code and system commands
  • Can request human input when needed
  • Handles file operations and local tasks
  • No LLM - acts as execution proxy
Best For:
  • Code execution and testing
  • File system operations
  • Human-in-the-loop workflows
  • System integration tasks
Key Insight: Works best in combination with AssistantAgent - AI plans, UserProxy executes.

Practical Examples

Example 1: Single AssistantAgent
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.messages import TextMessage

# Create an AI assistant
assistant = AssistantAgent(
    name="assistant",
    model_client=OpenAIChatCompletionClient(model="gpt-4o"),
    system_message="You are a helpful coding assistant."
)

# Use the assistant directly
task = TextMessage(content="Write a Python function to calculate fibonacci numbers")
result = await assistant.run(task=task)
print(result.messages[-1].content)
Example 2: AssistantAgent + UserProxyAgent Collaboration
from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.messages import TextMessage

# Create AI assistant (thinks and plans)
assistant = AssistantAgent(
    name="coder",
    model_client=OpenAIChatCompletionClient(model="gpt-4o"),
    system_message="You are an expert Python developer."
)

# Create user proxy (executes and validates)
user_proxy = UserProxyAgent(
    name="executor",
    human_input_mode="NEVER",
    code_execution_config={"executor": "python"}
)

# Create a team for collaboration
team = RoundRobinGroupChat([assistant, user_proxy])

# Run a task requiring both thinking and execution
task = TextMessage(content="Create and test a function that finds prime numbers up to 100")
result = await team.run(task=task)

Tool vs UserProxyAgent: Key Differences

Common Question: When should I use a Tool versus a UserProxyAgent for task execution?
Tools
What They Are:

Functions that extend an agent's capabilities, executed directly within the agent's run() method.

Best For:
  • Simple, stateless operations
  • API calls and data retrieval
  • Quick calculations or transformations
  • When you want the AI to autonomously decide tool usage
Characteristics:
  • Executed inline during conversation
  • No separate agent identity
  • Results returned immediately
  • No conversation participation
async def calculator(expression: str) -> str:
    """Calculate mathematical expressions"""
    return str(eval(expression))

agent = AssistantAgent(
    name="math_helper",
    model_client=model_client,
    tools=[calculator]  # Tool attached to agent
)
UserProxyAgent
What It Is:

A separate agent that participates in conversations and can execute complex, stateful operations.

Best For:
  • Complex code execution and testing
  • Multi-step workflows
  • Human-in-the-loop interactions
  • When you need conversation context for execution
Characteristics:
  • Participates as a team member
  • Has its own conversation identity
  • Can maintain state across interactions
  • Provides feedback and results in chat
user_proxy = UserProxyAgent(
    name="executor",
    human_input_mode="NEVER",
    code_execution_config={
        "executor": "python"
    }
)

# Part of a team conversation
team = RoundRobinGroupChat([ai_agent, user_proxy])
Decision Matrix: When to Use Which?
Scenario Recommendation Reason
Simple API calls Tool Lightweight, inline execution
Code generation + testing UserProxyAgent Needs conversation context for feedback
Data retrieval Tool Stateless operation
Human approval workflows UserProxyAgent Designed for human interaction
Mathematical calculations Tool Quick, deterministic results

Advanced Agent Features

Multi-Modal Input

Handle text, images, and files in conversations:

from autogen_agentchat.messages import MultiModalMessage
from autogen_core import Image
import PIL

# Create multi-modal message
pil_image = PIL.Image.open("image.jpg")
img = Image(pil_image)
message = MultiModalMessage(
    content=["Describe this image", img],
    source="user"
)

result = await assistant.run(task=message)
Structured Output

Get structured JSON responses with validation:

from pydantic import BaseModel
from typing import Literal

class AgentResponse(BaseModel):
    thoughts: str
    response: Literal["happy", "sad", "neutral"]

agent = AssistantAgent(
    "assistant",
    model_client=model_client,
    output_content_type=AgentResponse
)

result = await agent.run(task="I am happy.")
print(result.messages[-1].content.response)  # "happy"
Tools Integration

Extend agent capabilities with custom tools:

async def web_search(query: str) -> str:
    """Search the web for information"""
    # Implementation here
    return search_results

agent = AssistantAgent(
    name="researcher",
    model_client=model_client,
    tools=[web_search],
    system_message="Use tools to solve tasks."
)
Tools are executed directly within the agent's run() method
Streaming Responses

Stream responses as they're generated:

from autogen_agentchat.ui import Console

# Stream individual messages
async for message in agent.run_stream(task="Write a story"):
    print(message)

# Or use Console for formatted output
await Console(
    agent.run_stream(task="Write a story"),
    output_stats=True
)
Useful for long-running tasks and better user experience

Other Preset Agents

CodeExecutorAgent

Specialized agent for code execution tasks

MultimodalWebSurfer

Search web and visit pages for information

FileSurfer

Search and browse local files

VideoSurfer

Watch and analyze video content

OpenAIAssistantAgent

Backed by OpenAI Assistant API

Model Clients & Integration

Connect your agents to various LLM providers through standardized model clients

OpenAI

Direct access to GPT-4, GPT-3.5, and other OpenAI models

# Install: pip install "autogen-ext[openai]"
from autogen_ext.models.openai import OpenAIChatCompletionClient

client = OpenAIChatCompletionClient(
    model="gpt-4o-2024-08-06",
    # api_key="sk-...",  # Or set OPENAI_API_KEY env var
)

# Test the client
from autogen_core.models import UserMessage
result = await client.create([
    UserMessage(content="What is AutoGen?", source="user")
])
print(result.content)
await client.close()
Azure OpenAI

Enterprise-grade OpenAI models hosted on Azure

# Install: pip install "autogen-ext[openai,azure]"
from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
from autogen_ext.auth.azure import AzureTokenProvider
from azure.identity import DefaultAzureCredential

# With AAD authentication
token_provider = AzureTokenProvider(
    DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAIChatCompletionClient(
    azure_deployment="your-deployment",
    model="gpt-4o",
    api_version="2024-06-01",
    azure_endpoint="https://your-endpoint.openai.azure.com/",
    azure_ad_token_provider=token_provider
)
Anthropic Claude

Access to Claude models (experimental support)

# Install: pip install "autogen-ext[anthropic]"
from autogen_ext.models.anthropic import AnthropicChatCompletionClient

client = AnthropicChatCompletionClient(
    model="claude-3-7-sonnet-20250219"
    # api_key will be read from ANTHROPIC_API_KEY env var
)

result = await client.create([
    UserMessage(content="Explain quantum computing", source="user")
])
print(result.content)
await client.close()
Local Models (Ollama)

Run models locally on your machine

# Install: pip install "autogen-ext[ollama]"
from autogen_ext.models.ollama import OllamaChatCompletionClient

# Assuming Ollama server running on localhost:11434
client = OllamaChatCompletionClient(model="llama3.2")

result = await client.create([
    UserMessage(content="Hello, local model!", source="user")
])
print(result.content)
await client.close()
Local models may have limited capabilities compared to cloud models

Advanced Model Configuration

Model Context

Control conversation history sent to models:

from autogen_core.model_context import BufferedChatCompletionContext

# Limit context to last 5 messages
agent = AssistantAgent(
    name="assistant",
    model_client=model_client,
    model_context=BufferedChatCompletionContext(
        buffer_size=5
    )
)
  • UnboundedChatCompletionContext: Full history (default)
  • BufferedChatCompletionContext: Last N messages
  • TokenLimitedChatCompletionContext: Token-based limit
Response Caching

Cache responses to reduce costs and latency:

from autogen_core.models import ChatCompletionCache

# Wrap your model client with caching
cached_client = ChatCompletionCache(
    original_client=model_client,
    cache_file="model_cache.json"
)

agent = AssistantAgent(
    name="assistant",
    model_client=cached_client
)
Especially useful for development and testing
Model Call Logging

Log all model calls for debugging and monitoring:

import logging
from autogen_core import EVENT_LOGGER_NAME

# Configure logging
logging.basicConfig(level=logging.WARNING)
logger = logging.getLogger(EVENT_LOGGER_NAME)
logger.addHandler(logging.StreamHandler())
logger.setLevel(logging.INFO)

# Now all model calls will be logged
Event type is 'LLMCall' for model interactions
Token Streaming

Stream tokens as they're generated:

streaming_assistant = AssistantAgent(
    name="assistant",
    model_client=model_client,
    model_client_stream=True  # Enable token streaming
)

# Stream tokens
async for message in streaming_assistant.run_stream(
    task="Write a long story"
):
    if hasattr(message, 'content'):
        print(message.content, end='', flush=True)
Requires model provider to support streaming

Model Provider Comparison

Provider Setup Complexity Performance Cost Privacy Best For
OpenAI Easy Excellent Moderate Cloud General-purpose, prototyping
Azure OpenAI Moderate Excellent Moderate Enterprise Enterprise deployments
Anthropic Easy Excellent Moderate Cloud Long conversations, reasoning
Local (Ollama) Complex Variable Free Private Privacy-sensitive, offline use

Best Practices

Do's
  • Use environment variables for API keys
  • Implement proper error handling and retries
  • Cache responses in development
  • Monitor token usage and costs
  • Use appropriate model context limits
  • Test with different model providers
Don'ts
  • Don't hardcode API keys in source code
  • Avoid sending sensitive data to cloud models
  • Don't ignore model context limits
  • Avoid blocking calls without timeout
  • Don't forget to close model clients
  • Avoid parallel tool calls with stateful agents
Previous: Fundamentals
Great Progress! You've mastered agents and models!
Next: Teams & Workflows