File size: 10,187 Bytes
35bd451
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
# πŸ—οΈ Architecture Overview

## System Architecture

This Hugging Face Space implements a comparative agent system with three reasoning modes. Here's how everything works together:

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Gradio UI Layer                          β”‚
β”‚  - Question Input                                           β”‚
β”‚  - Mode Selection (Think/Act/ReAct/All)                    β”‚
β”‚  - Three Output Panels (side-by-side comparison)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Agent Controller                          β”‚
β”‚  run_comparison() - Routes to appropriate mode handler     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό                     β–Ό              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Think-Only  β”‚    β”‚   Act-Only   β”‚    β”‚    ReAct     β”‚
β”‚    Mode      β”‚    β”‚     Mode     β”‚    β”‚     Mode     β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                   β”‚                    β”‚
       β–Ό                   β–Ό                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    LLM Interface                            β”‚
β”‚  call_llm() - Communicates with openai/gpt-oss-20b        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
                   β–Ό (Act-Only & ReAct modes only)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Tool Executor                            β”‚
β”‚  - parse_action()                                           β”‚
β”‚  - call_tool()                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”
       β–Ό                       β–Ό           β–Ό           β–Ό      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DuckDuckGo β”‚  β”‚ Wikipedia  β”‚  β”‚Weatherβ”‚ β”‚Calcβ”‚  β”‚ Python  β”‚
β”‚   Search   β”‚  β”‚   Search   β”‚  β”‚  API  β”‚ β”‚    β”‚  β”‚  REPL   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Component Details

### 1. **Tool Layer**

Each tool is wrapped in a `Tool` class with:
- **name**: Identifier for the LLM to reference
- **description**: Instructions for when/how to use the tool
- **func**: The actual implementation

**Tool Implementations:**

- `duckduckgo_search()`: Uses DuckDuckGo's JSON API
- `wikipedia_search()`: Uses the Wikipedia Python library
- `get_weather()`: Queries wttr.in API for weather data
- `calculate()`: Safe AST-based math expression evaluator
- `python_repl()`: Sandboxed Python execution with whitelisted builtins

### 2. **Agent Modes**

#### Think-Only Mode (`think_only_mode`)
```
User Question β†’ System Prompt β†’ LLM β†’ Thoughts β†’ Answer
```
- Single LLM call with CoT prompt
- No tool access
- Shows reasoning steps
- Best for knowledge-based questions

#### Act-Only Mode (`act_only_mode`)
```
User Question β†’ System Prompt β†’ LLM β†’ Action
                                   ↓
                            Execute Tool β†’ Observation
                                   ↓
                                  LLM β†’ Action/Answer
                                   ↓
                                  ...
```
- Iterative loop: Action β†’ Observation
- No explicit "Thought" step
- Maximum 5 iterations
- Best for information gathering

#### ReAct Mode (`react_mode`)
```
User Question β†’ System Prompt β†’ LLM β†’ Thought β†’ Action
                                         ↓
                                  Execute Tool β†’ Observation
                                         ↓
                                       LLM β†’ Thought β†’ Action/Answer
                                         ↓
                                        ...
```
- Full Thought-Action-Observation cycle
- Most comprehensive reasoning
- Maximum 5 iterations
- Best for complex multi-step problems

### 3. **LLM Interface**

**`call_llm()` Function:**
- Uses Hugging Face Inference API
- Model: openai/gpt-oss-20b
- Supports chat format (messages list)
- Configurable temperature and max_tokens

**Authentication:**
- Requires `HF_TOKEN` environment variable
- Set in Space secrets (secure)

### 4. **Parsing & Control Flow**

**`parse_action()` Function:**
- Extracts `Action:` and `Action Input:` from LLM response
- Uses regex to handle various formats
- Returns (action_name, action_input) tuple

**Iteration Control:**
- Max 5 iterations per mode to prevent infinite loops
- Early termination when "Answer:" detected
- Error handling for malformed responses

### 5. **UI Layer (Gradio)**

**Components:**
- **Input Section**: Question textbox + mode dropdown
- **Example Buttons**: Pre-filled question templates
- **Output Panels**: Three side-by-side Markdown displays
- **Streaming**: Generator functions for real-time updates

**User Flow:**
1. User enters question or clicks example
2. Selects mode (or "All" for comparison)
3. Clicks "Run"
4. Sees real-time updates in output panel(s)
5. Views final answer and complete reasoning trace

## Data Flow Example

### Example: "What's the weather in Paris?"

**Mode: ReAct**

1. User submits question
2. `react_mode()` called with question
3. Prompt formatted with question + tool descriptions
4. First LLM call:
   ```
   Thought: I need to check the current weather in Paris
   Action: get_weather
   Action Input: Paris
   ```
5. `parse_action()` extracts tool call
6. `call_tool("get_weather", "Paris")` executes
7. Observation: "Weather in Paris: Cloudy, 15Β°C..."
8. Second LLM call with observation
9. LLM responds:
   ```
   Thought: I have the weather information
   Answer: The current weather in Paris is...
   ```
10. Generator yields formatted output to UI
11. User sees complete trace in ReAct panel

## Key Design Patterns

### 1. **Generator Pattern for Streaming**
```python
def mode(question: str) -> Generator[str, None, None]:
    yield "Step 1..."
    # process
    yield "Step 2..."
    # etc
```
Enables real-time UI updates without blocking

### 2. **Tool Registry Pattern**
```python
TOOLS = [Tool(name, description, func), ...]
```
Easy to add new tools - just append to list

### 3. **Prompt Templates**
```python
PROMPT = """...""".format(question=q, tools=t)
```
Modular prompts for each mode

### 4. **Safe Execution**
- AST parsing for calculator (no `eval()`)
- Whitelisted builtins for Python REPL
- Timeout limits on API calls
- Error handling with fallback messages

## Extensibility

### Adding a New Tool

```python
def my_tool(input: str) -> str:
    # Implementation
    return result

TOOLS.append(Tool(
    name="my_tool",
    description="When to use this tool...",
    func=my_tool
))
```

### Adding a New Mode

```python
def hybrid_mode(question: str) -> Generator[str, None, None]:
    # Custom logic mixing elements
    yield "Starting hybrid mode..."
    # ...
    
# Add to run_comparison() and UI dropdown
```

### Customizing Prompts

Edit the `*_PROMPT` constants to change agent behavior:
- Add constraints
- Change format
- Provide examples
- Adjust tone

## Performance Considerations

1. **API Latency**: Model calls take 2-5 seconds
2. **Tool Latency**: External APIs add 1-2 seconds per call
3. **Iteration Count**: 5 iterations max = ~30 seconds worst case
4. **Parallel Modes**: "All" mode runs sequentially (not parallel)

## Security Notes

1. **API Keys**: Never commit `HF_TOKEN` to repo
2. **Python REPL**: Sandboxed with limited builtins
3. **User Input**: Sanitized before tool execution
4. **Rate Limits**: Consider adding rate limiting for production

## Testing Strategy

1. **Unit Tests**: Test individual tool functions
2. **Integration Tests**: Test mode handlers end-to-end
3. **Prompt Tests**: Verify LLM responses parse correctly
4. **UI Tests**: Test Gradio interface components

## Future Enhancements

- [ ] Add memory/conversation history
- [ ] Implement parallel tool calling
- [ ] Add caching layer for repeated queries
- [ ] Support custom user tools
- [ ] Add performance metrics/timing
- [ ] Implement token counting/cost tracking
- [ ] Add export functionality for reasoning traces