Spaces:

HAMMALE
/

ReACT

Sleeping

App Files Files Community

ReACT / ARCHITECTURE.md

HAMMALE

Initial ReAct Space: Compare Think-Only, Act-Only, and ReAct reasoning modes

35bd451 9 days ago

preview code

raw

history blame contribute delete

10.2 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

🏗️ Architecture Overview

System Architecture

This Hugging Face Space implements a comparative agent system with three reasoning modes. Here's how everything works together:

┌─────────────────────────────────────────────────────────────┐
│                    Gradio UI Layer                          │
│  - Question Input                                           │
│  - Mode Selection (Think/Act/ReAct/All)                    │
│  - Three Output Panels (side-by-side comparison)           │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────────────────┐
│                   Agent Controller                          │
│  run_comparison() - Routes to appropriate mode handler     │
└──────────────────┬──────────────────────────────────────────┘
                   │
        ┌──────────┴──────────┬──────────────┐
        ▼                     ▼              ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Think-Only  │    │   Act-Only   │    │    ReAct     │
│    Mode      │    │     Mode     │    │     Mode     │
└──────┬───────┘    └──────┬───────┘    └──────┬───────┘
       │                   │                    │
       ▼                   ▼                    ▼
┌─────────────────────────────────────────────────────────────┐
│                    LLM Interface                            │
│  call_llm() - Communicates with openai/gpt-oss-20b        │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   ▼ (Act-Only & ReAct modes only)
┌─────────────────────────────────────────────────────────────┐
│                    Tool Executor                            │
│  - parse_action()                                           │
│  - call_tool()                                              │
└──────────────────┬──────────────────────────────────────────┘
                   │
       ┌───────────┴───────────┬───────────┬───────────┬──────┐
       ▼                       ▼           ▼           ▼      ▼
┌────────────┐  ┌────────────┐  ┌──────┐  ┌────┐  ┌─────────┐
│ DuckDuckGo │  │ Wikipedia  │  │Weather│ │Calc│  │ Python  │
│   Search   │  │   Search   │  │  API  │ │    │  │  REPL   │
└────────────┘  └────────────┘  └──────┘  └────┘  └─────────┘

Component Details

1. Tool Layer

Each tool is wrapped in a Tool class with:

name: Identifier for the LLM to reference
description: Instructions for when/how to use the tool
func: The actual implementation

Tool Implementations:

duckduckgo_search(): Uses DuckDuckGo's JSON API
wikipedia_search(): Uses the Wikipedia Python library
get_weather(): Queries wttr.in API for weather data
calculate(): Safe AST-based math expression evaluator
python_repl(): Sandboxed Python execution with whitelisted builtins

2. Agent Modes

Think-Only Mode (`think_only_mode`)

User Question → System Prompt → LLM → Thoughts → Answer

Single LLM call with CoT prompt
No tool access
Shows reasoning steps
Best for knowledge-based questions

Act-Only Mode (`act_only_mode`)

User Question → System Prompt → LLM → Action
                                   ↓
                            Execute Tool → Observation
                                   ↓
                                  LLM → Action/Answer
                                   ↓
                                  ...

Iterative loop: Action → Observation
No explicit "Thought" step
Maximum 5 iterations
Best for information gathering

ReAct Mode (`react_mode`)

User Question → System Prompt → LLM → Thought → Action
                                         ↓
                                  Execute Tool → Observation
                                         ↓
                                       LLM → Thought → Action/Answer
                                         ↓
                                        ...

Full Thought-Action-Observation cycle
Most comprehensive reasoning
Maximum 5 iterations
Best for complex multi-step problems

3. LLM Interface

call_llm() Function:

Uses Hugging Face Inference API
Model: openai/gpt-oss-20b
Supports chat format (messages list)
Configurable temperature and max_tokens

Authentication:

Requires HF_TOKEN environment variable
Set in Space secrets (secure)

4. Parsing & Control Flow

parse_action() Function:

Extracts Action: and Action Input: from LLM response
Uses regex to handle various formats
Returns (action_name, action_input) tuple

Iteration Control:

Max 5 iterations per mode to prevent infinite loops
Early termination when "Answer:" detected
Error handling for malformed responses

5. UI Layer (Gradio)

Components:

Input Section: Question textbox + mode dropdown
Example Buttons: Pre-filled question templates
Output Panels: Three side-by-side Markdown displays
Streaming: Generator functions for real-time updates

User Flow:

User enters question or clicks example
Selects mode (or "All" for comparison)
Clicks "Run"
Sees real-time updates in output panel(s)
Views final answer and complete reasoning trace

Data Flow Example

Example: "What's the weather in Paris?"

Mode: ReAct

User submits question
react_mode() called with question
Prompt formatted with question + tool descriptions

First LLM call:

Thought: I need to check the current weather in Paris
Action: get_weather
Action Input: Paris

parse_action() extracts tool call
call_tool("get_weather", "Paris") executes
Observation: "Weather in Paris: Cloudy, 15°C..."
Second LLM call with observation

LLM responds:

Thought: I have the weather information
Answer: The current weather in Paris is...

Generator yields formatted output to UI
User sees complete trace in ReAct panel

Key Design Patterns

1. Generator Pattern for Streaming

def mode(question: str) -> Generator[str, None, None]:
    yield "Step 1..."
    # process
    yield "Step 2..."
    # etc

Enables real-time UI updates without blocking

2. Tool Registry Pattern

TOOLS = [Tool(name, description, func), ...]

Easy to add new tools - just append to list

3. Prompt Templates

PROMPT = """...""".format(question=q, tools=t)

Modular prompts for each mode

4. Safe Execution

AST parsing for calculator (no eval())
Whitelisted builtins for Python REPL
Timeout limits on API calls
Error handling with fallback messages

Extensibility

Adding a New Tool

def my_tool(input: str) -> str:
    # Implementation
    return result

TOOLS.append(Tool(
    name="my_tool",
    description="When to use this tool...",
    func=my_tool
))

Adding a New Mode

def hybrid_mode(question: str) -> Generator[str, None, None]:
    # Custom logic mixing elements
    yield "Starting hybrid mode..."
    # ...
    
# Add to run_comparison() and UI dropdown

Customizing Prompts

Edit the *_PROMPT constants to change agent behavior:

Add constraints
Change format
Provide examples
Adjust tone

Performance Considerations

API Latency: Model calls take 2-5 seconds
Tool Latency: External APIs add 1-2 seconds per call
Iteration Count: 5 iterations max = ~30 seconds worst case
Parallel Modes: "All" mode runs sequentially (not parallel)

Security Notes

API Keys: Never commit HF_TOKEN to repo
Python REPL: Sandboxed with limited builtins
User Input: Sanitized before tool execution
Rate Limits: Consider adding rate limiting for production

Testing Strategy

Unit Tests: Test individual tool functions
Integration Tests: Test mode handlers end-to-end
Prompt Tests: Verify LLM responses parse correctly
UI Tests: Test Gradio interface components

Future Enhancements

Add memory/conversation history
Implement parallel tool calling
Add caching layer for repeated queries
Support custom user tools
Add performance metrics/timing
Implement token counting/cost tracking
Add export functionality for reasoning traces