Agents & Models
Master the building blocks of intelligent multi-agent systems
Learn about agent types, model integration, and best practices for production deployments
What You'll Master
- AssistantAgent and UserProxyAgent usage
- Tool integration and workbenches
- Multi-modal input handling
- Structured output and streaming
- Model client configuration
- OpenAI, Azure, Anthropic, and local models
- Authentication and security
- Production deployment patterns
Agent Types & Capabilities
AutoGen AgentChat provides preset agents, each with unique behaviors and capabilities
AssistantAgent
AI-powered agent with LLM capabilities
Key Features:
- Powered by language models (GPT-4, Claude, etc.)
- Can use tools and function calls
- Generates responses autonomously
- Supports system message configuration
- Built-in reasoning and planning
Best For:
- Content generation and analysis
- Code writing and debugging
- Research and information processing
- Complex problem-solving tasks
UserProxyAgent
Human interface and execution agent
Key Features:
- Represents human users in conversations
- Executes code and system commands
- Can request human input when needed
- Handles file operations and local tasks
- No LLM - acts as execution proxy
Best For:
- Code execution and testing
- File system operations
- Human-in-the-loop workflows
- System integration tasks
Practical Examples
Example 1: Single AssistantAgent
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.messages import TextMessage
# Create an AI assistant
assistant = AssistantAgent(
name="assistant",
model_client=OpenAIChatCompletionClient(model="gpt-4o"),
system_message="You are a helpful coding assistant."
)
# Use the assistant directly
task = TextMessage(content="Write a Python function to calculate fibonacci numbers")
result = await assistant.run(task=task)
print(result.messages[-1].content)
Example 2: AssistantAgent + UserProxyAgent Collaboration
from autogen_agentchat.agents import AssistantAgent, UserProxyAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.messages import TextMessage
# Create AI assistant (thinks and plans)
assistant = AssistantAgent(
name="coder",
model_client=OpenAIChatCompletionClient(model="gpt-4o"),
system_message="You are an expert Python developer."
)
# Create user proxy (executes and validates)
user_proxy = UserProxyAgent(
name="executor",
human_input_mode="NEVER",
code_execution_config={"executor": "python"}
)
# Create a team for collaboration
team = RoundRobinGroupChat([assistant, user_proxy])
# Run a task requiring both thinking and execution
task = TextMessage(content="Create and test a function that finds prime numbers up to 100")
result = await team.run(task=task)
Tool vs UserProxyAgent: Key Differences
Tools
What They Are:
Functions that extend an agent's capabilities, executed directly within the agent's run() method.
Best For:
- Simple, stateless operations
- API calls and data retrieval
- Quick calculations or transformations
- When you want the AI to autonomously decide tool usage
Characteristics:
- Executed inline during conversation
- No separate agent identity
- Results returned immediately
- No conversation participation
async def calculator(expression: str) -> str:
"""Calculate mathematical expressions"""
return str(eval(expression))
agent = AssistantAgent(
name="math_helper",
model_client=model_client,
tools=[calculator] # Tool attached to agent
)
UserProxyAgent
What It Is:
A separate agent that participates in conversations and can execute complex, stateful operations.
Best For:
- Complex code execution and testing
- Multi-step workflows
- Human-in-the-loop interactions
- When you need conversation context for execution
Characteristics:
- Participates as a team member
- Has its own conversation identity
- Can maintain state across interactions
- Provides feedback and results in chat
user_proxy = UserProxyAgent(
name="executor",
human_input_mode="NEVER",
code_execution_config={
"executor": "python"
}
)
# Part of a team conversation
team = RoundRobinGroupChat([ai_agent, user_proxy])
Decision Matrix: When to Use Which?
| Scenario | Recommendation | Reason |
|---|---|---|
| Simple API calls | Tool | Lightweight, inline execution |
| Code generation + testing | UserProxyAgent | Needs conversation context for feedback |
| Data retrieval | Tool | Stateless operation |
| Human approval workflows | UserProxyAgent | Designed for human interaction |
| Mathematical calculations | Tool | Quick, deterministic results |
Advanced Agent Features
Multi-Modal Input
Handle text, images, and files in conversations:
from autogen_agentchat.messages import MultiModalMessage
from autogen_core import Image
import PIL
# Create multi-modal message
pil_image = PIL.Image.open("image.jpg")
img = Image(pil_image)
message = MultiModalMessage(
content=["Describe this image", img],
source="user"
)
result = await assistant.run(task=message)
Structured Output
Get structured JSON responses with validation:
from pydantic import BaseModel
from typing import Literal
class AgentResponse(BaseModel):
thoughts: str
response: Literal["happy", "sad", "neutral"]
agent = AssistantAgent(
"assistant",
model_client=model_client,
output_content_type=AgentResponse
)
result = await agent.run(task="I am happy.")
print(result.messages[-1].content.response) # "happy"
Tools Integration
Extend agent capabilities with custom tools:
async def web_search(query: str) -> str:
"""Search the web for information"""
# Implementation here
return search_results
agent = AssistantAgent(
name="researcher",
model_client=model_client,
tools=[web_search],
system_message="Use tools to solve tasks."
)
Streaming Responses
Stream responses as they're generated:
from autogen_agentchat.ui import Console
# Stream individual messages
async for message in agent.run_stream(task="Write a story"):
print(message)
# Or use Console for formatted output
await Console(
agent.run_stream(task="Write a story"),
output_stats=True
)
Other Preset Agents
CodeExecutorAgent
Specialized agent for code execution tasks
MultimodalWebSurfer
Search web and visit pages for information
FileSurfer
Search and browse local files
VideoSurfer
Watch and analyze video content
OpenAIAssistantAgent
Backed by OpenAI Assistant API
Model Clients & Integration
Connect your agents to various LLM providers through standardized model clients
OpenAI
Direct access to GPT-4, GPT-3.5, and other OpenAI models
# Install: pip install "autogen-ext[openai]"
from autogen_ext.models.openai import OpenAIChatCompletionClient
client = OpenAIChatCompletionClient(
model="gpt-4o-2024-08-06",
# api_key="sk-...", # Or set OPENAI_API_KEY env var
)
# Test the client
from autogen_core.models import UserMessage
result = await client.create([
UserMessage(content="What is AutoGen?", source="user")
])
print(result.content)
await client.close()
Azure OpenAI
Enterprise-grade OpenAI models hosted on Azure
# Install: pip install "autogen-ext[openai,azure]"
from autogen_ext.models.openai import AzureOpenAIChatCompletionClient
from autogen_ext.auth.azure import AzureTokenProvider
from azure.identity import DefaultAzureCredential
# With AAD authentication
token_provider = AzureTokenProvider(
DefaultAzureCredential(),
"https://cognitiveservices.azure.com/.default"
)
client = AzureOpenAIChatCompletionClient(
azure_deployment="your-deployment",
model="gpt-4o",
api_version="2024-06-01",
azure_endpoint="https://your-endpoint.openai.azure.com/",
azure_ad_token_provider=token_provider
)
Anthropic Claude
Access to Claude models (experimental support)
# Install: pip install "autogen-ext[anthropic]"
from autogen_ext.models.anthropic import AnthropicChatCompletionClient
client = AnthropicChatCompletionClient(
model="claude-3-7-sonnet-20250219"
# api_key will be read from ANTHROPIC_API_KEY env var
)
result = await client.create([
UserMessage(content="Explain quantum computing", source="user")
])
print(result.content)
await client.close()
Local Models (Ollama)
Run models locally on your machine
# Install: pip install "autogen-ext[ollama]"
from autogen_ext.models.ollama import OllamaChatCompletionClient
# Assuming Ollama server running on localhost:11434
client = OllamaChatCompletionClient(model="llama3.2")
result = await client.create([
UserMessage(content="Hello, local model!", source="user")
])
print(result.content)
await client.close()
Advanced Model Configuration
Model Context
Control conversation history sent to models:
from autogen_core.model_context import BufferedChatCompletionContext
# Limit context to last 5 messages
agent = AssistantAgent(
name="assistant",
model_client=model_client,
model_context=BufferedChatCompletionContext(
buffer_size=5
)
)
- UnboundedChatCompletionContext: Full history (default)
- BufferedChatCompletionContext: Last N messages
- TokenLimitedChatCompletionContext: Token-based limit
Response Caching
Cache responses to reduce costs and latency:
from autogen_core.models import ChatCompletionCache
# Wrap your model client with caching
cached_client = ChatCompletionCache(
original_client=model_client,
cache_file="model_cache.json"
)
agent = AssistantAgent(
name="assistant",
model_client=cached_client
)
Model Call Logging
Log all model calls for debugging and monitoring:
import logging
from autogen_core import EVENT_LOGGER_NAME
# Configure logging
logging.basicConfig(level=logging.WARNING)
logger = logging.getLogger(EVENT_LOGGER_NAME)
logger.addHandler(logging.StreamHandler())
logger.setLevel(logging.INFO)
# Now all model calls will be logged
Token Streaming
Stream tokens as they're generated:
streaming_assistant = AssistantAgent(
name="assistant",
model_client=model_client,
model_client_stream=True # Enable token streaming
)
# Stream tokens
async for message in streaming_assistant.run_stream(
task="Write a long story"
):
if hasattr(message, 'content'):
print(message.content, end='', flush=True)
Model Provider Comparison
| Provider | Setup Complexity | Performance | Cost | Privacy | Best For |
|---|---|---|---|---|---|
| OpenAI | Easy | Excellent | Moderate | Cloud | General-purpose, prototyping |
| Azure OpenAI | Moderate | Excellent | Moderate | Enterprise | Enterprise deployments |
| Anthropic | Easy | Excellent | Moderate | Cloud | Long conversations, reasoning |
| Local (Ollama) | Complex | Variable | Free | Private | Privacy-sensitive, offline use |
Best Practices
Do's
- Use environment variables for API keys
- Implement proper error handling and retries
- Cache responses in development
- Monitor token usage and costs
- Use appropriate model context limits
- Test with different model providers
Don'ts
- Don't hardcode API keys in source code
- Avoid sending sensitive data to cloud models
- Don't ignore model context limits
- Avoid blocking calls without timeout
- Don't forget to close model clients
- Avoid parallel tool calls with stateful agents