File size: 10,187 Bytes
35bd451 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 |
# ποΈ Architecture Overview
## System Architecture
This Hugging Face Space implements a comparative agent system with three reasoning modes. Here's how everything works together:
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Gradio UI Layer β
β - Question Input β
β - Mode Selection (Think/Act/ReAct/All) β
β - Three Output Panels (side-by-side comparison) β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Agent Controller β
β run_comparison() - Routes to appropriate mode handler β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββ΄βββββββββββ¬βββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Think-Only β β Act-Only β β ReAct β
β Mode β β Mode β β Mode β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LLM Interface β
β call_llm() - Communicates with openai/gpt-oss-20b β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ (Act-Only & ReAct modes only)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Tool Executor β
β - parse_action() β
β - call_tool() β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββ΄ββββββββββββ¬ββββββββββββ¬ββββββββββββ¬βββββββ
βΌ βΌ βΌ βΌ βΌ
ββββββββββββββ ββββββββββββββ ββββββββ ββββββ βββββββββββ
β DuckDuckGo β β Wikipedia β βWeatherβ βCalcβ β Python β
β Search β β Search β β API β β β β REPL β
ββββββββββββββ ββββββββββββββ ββββββββ ββββββ βββββββββββ
```
## Component Details
### 1. **Tool Layer**
Each tool is wrapped in a `Tool` class with:
- **name**: Identifier for the LLM to reference
- **description**: Instructions for when/how to use the tool
- **func**: The actual implementation
**Tool Implementations:**
- `duckduckgo_search()`: Uses DuckDuckGo's JSON API
- `wikipedia_search()`: Uses the Wikipedia Python library
- `get_weather()`: Queries wttr.in API for weather data
- `calculate()`: Safe AST-based math expression evaluator
- `python_repl()`: Sandboxed Python execution with whitelisted builtins
### 2. **Agent Modes**
#### Think-Only Mode (`think_only_mode`)
```
User Question β System Prompt β LLM β Thoughts β Answer
```
- Single LLM call with CoT prompt
- No tool access
- Shows reasoning steps
- Best for knowledge-based questions
#### Act-Only Mode (`act_only_mode`)
```
User Question β System Prompt β LLM β Action
β
Execute Tool β Observation
β
LLM β Action/Answer
β
...
```
- Iterative loop: Action β Observation
- No explicit "Thought" step
- Maximum 5 iterations
- Best for information gathering
#### ReAct Mode (`react_mode`)
```
User Question β System Prompt β LLM β Thought β Action
β
Execute Tool β Observation
β
LLM β Thought β Action/Answer
β
...
```
- Full Thought-Action-Observation cycle
- Most comprehensive reasoning
- Maximum 5 iterations
- Best for complex multi-step problems
### 3. **LLM Interface**
**`call_llm()` Function:**
- Uses Hugging Face Inference API
- Model: openai/gpt-oss-20b
- Supports chat format (messages list)
- Configurable temperature and max_tokens
**Authentication:**
- Requires `HF_TOKEN` environment variable
- Set in Space secrets (secure)
### 4. **Parsing & Control Flow**
**`parse_action()` Function:**
- Extracts `Action:` and `Action Input:` from LLM response
- Uses regex to handle various formats
- Returns (action_name, action_input) tuple
**Iteration Control:**
- Max 5 iterations per mode to prevent infinite loops
- Early termination when "Answer:" detected
- Error handling for malformed responses
### 5. **UI Layer (Gradio)**
**Components:**
- **Input Section**: Question textbox + mode dropdown
- **Example Buttons**: Pre-filled question templates
- **Output Panels**: Three side-by-side Markdown displays
- **Streaming**: Generator functions for real-time updates
**User Flow:**
1. User enters question or clicks example
2. Selects mode (or "All" for comparison)
3. Clicks "Run"
4. Sees real-time updates in output panel(s)
5. Views final answer and complete reasoning trace
## Data Flow Example
### Example: "What's the weather in Paris?"
**Mode: ReAct**
1. User submits question
2. `react_mode()` called with question
3. Prompt formatted with question + tool descriptions
4. First LLM call:
```
Thought: I need to check the current weather in Paris
Action: get_weather
Action Input: Paris
```
5. `parse_action()` extracts tool call
6. `call_tool("get_weather", "Paris")` executes
7. Observation: "Weather in Paris: Cloudy, 15Β°C..."
8. Second LLM call with observation
9. LLM responds:
```
Thought: I have the weather information
Answer: The current weather in Paris is...
```
10. Generator yields formatted output to UI
11. User sees complete trace in ReAct panel
## Key Design Patterns
### 1. **Generator Pattern for Streaming**
```python
def mode(question: str) -> Generator[str, None, None]:
yield "Step 1..."
# process
yield "Step 2..."
# etc
```
Enables real-time UI updates without blocking
### 2. **Tool Registry Pattern**
```python
TOOLS = [Tool(name, description, func), ...]
```
Easy to add new tools - just append to list
### 3. **Prompt Templates**
```python
PROMPT = """...""".format(question=q, tools=t)
```
Modular prompts for each mode
### 4. **Safe Execution**
- AST parsing for calculator (no `eval()`)
- Whitelisted builtins for Python REPL
- Timeout limits on API calls
- Error handling with fallback messages
## Extensibility
### Adding a New Tool
```python
def my_tool(input: str) -> str:
# Implementation
return result
TOOLS.append(Tool(
name="my_tool",
description="When to use this tool...",
func=my_tool
))
```
### Adding a New Mode
```python
def hybrid_mode(question: str) -> Generator[str, None, None]:
# Custom logic mixing elements
yield "Starting hybrid mode..."
# ...
# Add to run_comparison() and UI dropdown
```
### Customizing Prompts
Edit the `*_PROMPT` constants to change agent behavior:
- Add constraints
- Change format
- Provide examples
- Adjust tone
## Performance Considerations
1. **API Latency**: Model calls take 2-5 seconds
2. **Tool Latency**: External APIs add 1-2 seconds per call
3. **Iteration Count**: 5 iterations max = ~30 seconds worst case
4. **Parallel Modes**: "All" mode runs sequentially (not parallel)
## Security Notes
1. **API Keys**: Never commit `HF_TOKEN` to repo
2. **Python REPL**: Sandboxed with limited builtins
3. **User Input**: Sanitized before tool execution
4. **Rate Limits**: Consider adding rate limiting for production
## Testing Strategy
1. **Unit Tests**: Test individual tool functions
2. **Integration Tests**: Test mode handlers end-to-end
3. **Prompt Tests**: Verify LLM responses parse correctly
4. **UI Tests**: Test Gradio interface components
## Future Enhancements
- [ ] Add memory/conversation history
- [ ] Implement parallel tool calling
- [ ] Add caching layer for repeated queries
- [ ] Support custom user tools
- [ ] Add performance metrics/timing
- [ ] Implement token counting/cost tracking
- [ ] Add export functionality for reasoning traces
|