ReACT / ARCHITECTURE.md
HAMMALE's picture
Initial ReAct Space: Compare Think-Only, Act-Only, and ReAct reasoning modes
35bd451

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

πŸ—οΈ Architecture Overview

System Architecture

This Hugging Face Space implements a comparative agent system with three reasoning modes. Here's how everything works together:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Gradio UI Layer                          β”‚
β”‚  - Question Input                                           β”‚
β”‚  - Mode Selection (Think/Act/ReAct/All)                    β”‚
β”‚  - Three Output Panels (side-by-side comparison)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Agent Controller                          β”‚
β”‚  run_comparison() - Routes to appropriate mode handler     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό                     β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Think-Only  β”‚    β”‚   Act-Only   β”‚    β”‚    ReAct     β”‚
β”‚    Mode      β”‚    β”‚     Mode     β”‚    β”‚     Mode     β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                   β”‚                    β”‚
       β–Ό                   β–Ό                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    LLM Interface                            β”‚
β”‚  call_llm() - Communicates with openai/gpt-oss-20b        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό (Act-Only & ReAct modes only)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Tool Executor                            β”‚
β”‚  - parse_action()                                           β”‚
β”‚  - call_tool()                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
       β–Ό                       β–Ό           β–Ό           β–Ό      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DuckDuckGo β”‚  β”‚ Wikipedia  β”‚  β”‚Weatherβ”‚ β”‚Calcβ”‚  β”‚ Python  β”‚
β”‚   Search   β”‚  β”‚   Search   β”‚  β”‚  API  β”‚ β”‚    β”‚  β”‚  REPL   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Details

1. Tool Layer

Each tool is wrapped in a Tool class with:

  • name: Identifier for the LLM to reference
  • description: Instructions for when/how to use the tool
  • func: The actual implementation

Tool Implementations:

  • duckduckgo_search(): Uses DuckDuckGo's JSON API
  • wikipedia_search(): Uses the Wikipedia Python library
  • get_weather(): Queries wttr.in API for weather data
  • calculate(): Safe AST-based math expression evaluator
  • python_repl(): Sandboxed Python execution with whitelisted builtins

2. Agent Modes

Think-Only Mode (think_only_mode)

User Question β†’ System Prompt β†’ LLM β†’ Thoughts β†’ Answer
  • Single LLM call with CoT prompt
  • No tool access
  • Shows reasoning steps
  • Best for knowledge-based questions

Act-Only Mode (act_only_mode)

User Question β†’ System Prompt β†’ LLM β†’ Action
                                   ↓
                            Execute Tool β†’ Observation
                                   ↓
                                  LLM β†’ Action/Answer
                                   ↓
                                  ...
  • Iterative loop: Action β†’ Observation
  • No explicit "Thought" step
  • Maximum 5 iterations
  • Best for information gathering

ReAct Mode (react_mode)

User Question β†’ System Prompt β†’ LLM β†’ Thought β†’ Action
                                         ↓
                                  Execute Tool β†’ Observation
                                         ↓
                                       LLM β†’ Thought β†’ Action/Answer
                                         ↓
                                        ...
  • Full Thought-Action-Observation cycle
  • Most comprehensive reasoning
  • Maximum 5 iterations
  • Best for complex multi-step problems

3. LLM Interface

call_llm() Function:

  • Uses Hugging Face Inference API
  • Model: openai/gpt-oss-20b
  • Supports chat format (messages list)
  • Configurable temperature and max_tokens

Authentication:

  • Requires HF_TOKEN environment variable
  • Set in Space secrets (secure)

4. Parsing & Control Flow

parse_action() Function:

  • Extracts Action: and Action Input: from LLM response
  • Uses regex to handle various formats
  • Returns (action_name, action_input) tuple

Iteration Control:

  • Max 5 iterations per mode to prevent infinite loops
  • Early termination when "Answer:" detected
  • Error handling for malformed responses

5. UI Layer (Gradio)

Components:

  • Input Section: Question textbox + mode dropdown
  • Example Buttons: Pre-filled question templates
  • Output Panels: Three side-by-side Markdown displays
  • Streaming: Generator functions for real-time updates

User Flow:

  1. User enters question or clicks example
  2. Selects mode (or "All" for comparison)
  3. Clicks "Run"
  4. Sees real-time updates in output panel(s)
  5. Views final answer and complete reasoning trace

Data Flow Example

Example: "What's the weather in Paris?"

Mode: ReAct

  1. User submits question
  2. react_mode() called with question
  3. Prompt formatted with question + tool descriptions
  4. First LLM call:
    Thought: I need to check the current weather in Paris
    Action: get_weather
    Action Input: Paris
    
  5. parse_action() extracts tool call
  6. call_tool("get_weather", "Paris") executes
  7. Observation: "Weather in Paris: Cloudy, 15Β°C..."
  8. Second LLM call with observation
  9. LLM responds:
    Thought: I have the weather information
    Answer: The current weather in Paris is...
    
  10. Generator yields formatted output to UI
  11. User sees complete trace in ReAct panel

Key Design Patterns

1. Generator Pattern for Streaming

def mode(question: str) -> Generator[str, None, None]:
    yield "Step 1..."
    # process
    yield "Step 2..."
    # etc

Enables real-time UI updates without blocking

2. Tool Registry Pattern

TOOLS = [Tool(name, description, func), ...]

Easy to add new tools - just append to list

3. Prompt Templates

PROMPT = """...""".format(question=q, tools=t)

Modular prompts for each mode

4. Safe Execution

  • AST parsing for calculator (no eval())
  • Whitelisted builtins for Python REPL
  • Timeout limits on API calls
  • Error handling with fallback messages

Extensibility

Adding a New Tool

def my_tool(input: str) -> str:
    # Implementation
    return result

TOOLS.append(Tool(
    name="my_tool",
    description="When to use this tool...",
    func=my_tool
))

Adding a New Mode

def hybrid_mode(question: str) -> Generator[str, None, None]:
    # Custom logic mixing elements
    yield "Starting hybrid mode..."
    # ...
    
# Add to run_comparison() and UI dropdown

Customizing Prompts

Edit the *_PROMPT constants to change agent behavior:

  • Add constraints
  • Change format
  • Provide examples
  • Adjust tone

Performance Considerations

  1. API Latency: Model calls take 2-5 seconds
  2. Tool Latency: External APIs add 1-2 seconds per call
  3. Iteration Count: 5 iterations max = ~30 seconds worst case
  4. Parallel Modes: "All" mode runs sequentially (not parallel)

Security Notes

  1. API Keys: Never commit HF_TOKEN to repo
  2. Python REPL: Sandboxed with limited builtins
  3. User Input: Sanitized before tool execution
  4. Rate Limits: Consider adding rate limiting for production

Testing Strategy

  1. Unit Tests: Test individual tool functions
  2. Integration Tests: Test mode handlers end-to-end
  3. Prompt Tests: Verify LLM responses parse correctly
  4. UI Tests: Test Gradio interface components

Future Enhancements

  • Add memory/conversation history
  • Implement parallel tool calling
  • Add caching layer for repeated queries
  • Support custom user tools
  • Add performance metrics/timing
  • Implement token counting/cost tracking
  • Add export functionality for reasoning traces