Context Management API¶
PatchPal's context management system handles token estimation, context window limits, and automatic compaction.
TokenEstimator¶
patchpal.context.TokenEstimator(model_id)
¶
Estimate tokens in messages for context management.
Uses character-based estimation (~3 chars per token) as a fallback when actual token counts from API responses are not available. This works reliably for all models without requiring network access or external dependencies.
Source code in patchpal/context.py
estimate_tokens(text)
¶
Estimate tokens in text using character-based heuristic.
Uses ~3 chars per token which is accurate for code-heavy content and works reliably without requiring network access for tokenizer data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to estimate tokens for |
required |
Returns:
| Type | Description |
|---|---|
int
|
Estimated token count |
Source code in patchpal/context.py
estimate_message_tokens(message)
¶
Estimate tokens in a single message.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
message
|
Dict[str, Any]
|
Message dict with role, content, tool_calls, etc. |
required |
Returns:
| Type | Description |
|---|---|
int
|
Estimated token count |
Source code in patchpal/context.py
estimate_messages_tokens(messages)
¶
Estimate tokens in a list of messages.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
List[Dict[str, Any]]
|
List of message dicts |
required |
Returns:
| Type | Description |
|---|---|
int
|
Total estimated token count |
Source code in patchpal/context.py
ContextManager¶
patchpal.context.ContextManager(model_id, system_prompt)
¶
Manage context window with auto-compaction and pruning.
Initialize context manager.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_id
|
str
|
LiteLLM model identifier |
required |
system_prompt
|
str
|
System prompt text |
required |
Source code in patchpal/context.py
needs_compaction(messages, actual_prompt_tokens=None)
¶
Check if context window needs compaction.
ALWAYS estimates current messages to avoid staleness issues when predicting whether the NEXT API call will overflow. Using actual_prompt_tokens from a previous call can cause false negatives when large messages are added between the last API call and the compaction check.
Example of staleness bug (fixed): - Previous API call: 120K tokens (60% usage) - User pastes huge changelog: +90K tokens - Total: 210K tokens (exceeds 200K limit) - Bug: If we used actual_prompt_tokens=120K, we'd think we're at 60% - Fix: Always re-estimate to see the 210K total
The actual_prompt_tokens parameter is kept for API compatibility but ignored for compaction decisions. Use get_usage_stats() for display purposes where actual tokens are appropriate (staleness OK for showing recent stats).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
List[Dict[str, Any]]
|
Current message history |
required |
actual_prompt_tokens
|
int
|
IGNORED - kept for API compatibility only |
None
|
Returns:
| Type | Description |
|---|---|
bool
|
True if compaction is needed |
Source code in patchpal/context.py
get_usage_stats(messages, actual_prompt_tokens=None)
¶
Get current context usage statistics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
List[Dict[str, Any]]
|
Current message history |
required |
actual_prompt_tokens
|
int
|
Optional actual prompt tokens from latest API response (includes cache operations) |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict with usage statistics |
Source code in patchpal/context.py
Usage Example¶
from patchpal.agent import create_agent
agent = create_agent()
# Check context usage
stats = agent.context_manager.get_usage_stats(agent.messages)
print(f"Token usage: {stats['total_tokens']:,} / {stats['context_limit']:,}")
print(f"Usage: {stats['usage_percent']}%")
print(f"Output budget remaining: {stats['output_budget_remaining']:,} tokens")
# Check if compaction is needed
if agent.context_manager.needs_compaction(agent.messages):
print("Context window getting full - compaction will trigger soon")
# Manually trigger compaction (usually automatic)
agent._perform_auto_compaction()
How Context Management Works¶
- Token Estimation: Uses tiktoken (or fallback character estimation) to estimate message tokens
- Context Limits: Tracks model-specific context window sizes (e.g., 200K for Claude Sonnet)
- Automatic Compaction: When context reaches 70% full, summarizes old messages to free space
- Output Budget: Reserves tokens for model output based on context window size
Context Limits by Model Family¶
The context manager automatically detects limits for common models:
- Claude 3.5 Sonnet: 200,000 tokens
- Claude 3 Opus: 200,000 tokens
- GPT-4 Turbo: 128,000 tokens
- GPT-4: 8,192 tokens
- GPT-3.5: 16,385 tokens
For unknown models, falls back to 128,000 tokens.
Related¶
- Context Management Guide - Overview of context management
- Agent API - Using the agent with automatic context management