A newer version of the Gradio SDK is available:
6.1.0
ποΈ Architecture Overview
System Architecture
This Hugging Face Space implements a comparative agent system with three reasoning modes. Here's how everything works together:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Gradio UI Layer β
β - Question Input β
β - Mode Selection (Think/Act/ReAct/All) β
β - Three Output Panels (side-by-side comparison) β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent Controller β
β run_comparison() - Routes to appropriate mode handler β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββ΄βββββββββββ¬βββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Think-Only β β Act-Only β β ReAct β
β Mode β β Mode β β Mode β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LLM Interface β
β call_llm() - Communicates with openai/gpt-oss-20b β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ (Act-Only & ReAct modes only)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Tool Executor β
β - parse_action() β
β - call_tool() β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββ΄ββββββββββββ¬ββββββββββββ¬ββββββββββββ¬βββββββ
βΌ βΌ βΌ βΌ βΌ
ββββββββββββββ ββββββββββββββ ββββββββ ββββββ βββββββββββ
β DuckDuckGo β β Wikipedia β βWeatherβ βCalcβ β Python β
β Search β β Search β β API β β β β REPL β
ββββββββββββββ ββββββββββββββ ββββββββ ββββββ βββββββββββ
Component Details
1. Tool Layer
Each tool is wrapped in a Tool class with:
- name: Identifier for the LLM to reference
- description: Instructions for when/how to use the tool
- func: The actual implementation
Tool Implementations:
duckduckgo_search(): Uses DuckDuckGo's JSON APIwikipedia_search(): Uses the Wikipedia Python libraryget_weather(): Queries wttr.in API for weather datacalculate(): Safe AST-based math expression evaluatorpython_repl(): Sandboxed Python execution with whitelisted builtins
2. Agent Modes
Think-Only Mode (think_only_mode)
User Question β System Prompt β LLM β Thoughts β Answer
- Single LLM call with CoT prompt
- No tool access
- Shows reasoning steps
- Best for knowledge-based questions
Act-Only Mode (act_only_mode)
User Question β System Prompt β LLM β Action
β
Execute Tool β Observation
β
LLM β Action/Answer
β
...
- Iterative loop: Action β Observation
- No explicit "Thought" step
- Maximum 5 iterations
- Best for information gathering
ReAct Mode (react_mode)
User Question β System Prompt β LLM β Thought β Action
β
Execute Tool β Observation
β
LLM β Thought β Action/Answer
β
...
- Full Thought-Action-Observation cycle
- Most comprehensive reasoning
- Maximum 5 iterations
- Best for complex multi-step problems
3. LLM Interface
call_llm() Function:
- Uses Hugging Face Inference API
- Model: openai/gpt-oss-20b
- Supports chat format (messages list)
- Configurable temperature and max_tokens
Authentication:
- Requires
HF_TOKENenvironment variable - Set in Space secrets (secure)
4. Parsing & Control Flow
parse_action() Function:
- Extracts
Action:andAction Input:from LLM response - Uses regex to handle various formats
- Returns (action_name, action_input) tuple
Iteration Control:
- Max 5 iterations per mode to prevent infinite loops
- Early termination when "Answer:" detected
- Error handling for malformed responses
5. UI Layer (Gradio)
Components:
- Input Section: Question textbox + mode dropdown
- Example Buttons: Pre-filled question templates
- Output Panels: Three side-by-side Markdown displays
- Streaming: Generator functions for real-time updates
User Flow:
- User enters question or clicks example
- Selects mode (or "All" for comparison)
- Clicks "Run"
- Sees real-time updates in output panel(s)
- Views final answer and complete reasoning trace
Data Flow Example
Example: "What's the weather in Paris?"
Mode: ReAct
- User submits question
react_mode()called with question- Prompt formatted with question + tool descriptions
- First LLM call:
Thought: I need to check the current weather in Paris Action: get_weather Action Input: Paris parse_action()extracts tool callcall_tool("get_weather", "Paris")executes- Observation: "Weather in Paris: Cloudy, 15Β°C..."
- Second LLM call with observation
- LLM responds:
Thought: I have the weather information Answer: The current weather in Paris is... - Generator yields formatted output to UI
- User sees complete trace in ReAct panel
Key Design Patterns
1. Generator Pattern for Streaming
def mode(question: str) -> Generator[str, None, None]:
yield "Step 1..."
# process
yield "Step 2..."
# etc
Enables real-time UI updates without blocking
2. Tool Registry Pattern
TOOLS = [Tool(name, description, func), ...]
Easy to add new tools - just append to list
3. Prompt Templates
PROMPT = """...""".format(question=q, tools=t)
Modular prompts for each mode
4. Safe Execution
- AST parsing for calculator (no
eval()) - Whitelisted builtins for Python REPL
- Timeout limits on API calls
- Error handling with fallback messages
Extensibility
Adding a New Tool
def my_tool(input: str) -> str:
# Implementation
return result
TOOLS.append(Tool(
name="my_tool",
description="When to use this tool...",
func=my_tool
))
Adding a New Mode
def hybrid_mode(question: str) -> Generator[str, None, None]:
# Custom logic mixing elements
yield "Starting hybrid mode..."
# ...
# Add to run_comparison() and UI dropdown
Customizing Prompts
Edit the *_PROMPT constants to change agent behavior:
- Add constraints
- Change format
- Provide examples
- Adjust tone
Performance Considerations
- API Latency: Model calls take 2-5 seconds
- Tool Latency: External APIs add 1-2 seconds per call
- Iteration Count: 5 iterations max = ~30 seconds worst case
- Parallel Modes: "All" mode runs sequentially (not parallel)
Security Notes
- API Keys: Never commit
HF_TOKENto repo - Python REPL: Sandboxed with limited builtins
- User Input: Sanitized before tool execution
- Rate Limits: Consider adding rate limiting for production
Testing Strategy
- Unit Tests: Test individual tool functions
- Integration Tests: Test mode handlers end-to-end
- Prompt Tests: Verify LLM responses parse correctly
- UI Tests: Test Gradio interface components
Future Enhancements
- Add memory/conversation history
- Implement parallel tool calling
- Add caching layer for repeated queries
- Support custom user tools
- Add performance metrics/timing
- Implement token counting/cost tracking
- Add export functionality for reasoning traces