Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: AI Agent UN - Multi-Agent Simulation Framework
emoji: 🏛️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
AI Agent United Nations: Multi-Agent Simulation Framework
A structured system for simulating international diplomatic decision-making using 195 AI agents with constrained JSON outputs.
System Overview
This is an experimental framework demonstrating:
- Multi-agent coordination across 195 independent AI agents
- Structured output constraints with strict JSON schema validation
- Generic prompt templates producing country-specific behaviors
- Task execution model for running resolutions through all agents
High-Level Concept
graph TB
subgraph "Input Layer"
RES[UN Resolution Text]
end
subgraph "Agent Layer - 195 Independent Agents"
A1[Agent: USA<br/>System Prompt]
A2[Agent: China<br/>System Prompt]
A3[Agent: Russia<br/>System Prompt]
ADOT[...]
A195[Agent: Tuvalu<br/>System Prompt]
end
subgraph "LLM Processing"
LLM[Claude 3.5 Sonnet<br/>Structured JSON Output]
end
subgraph "Output Layer"
V1[Vote: yes<br/>Statement: ...]
V2[Vote: no<br/>Statement: ...]
V3[Vote: yes<br/>Statement: ...]
VDOT[...]
V195[Vote: yes<br/>Statement: ...]
end
subgraph "Aggregation"
AGG[Combined Results<br/>Vote Counts + All Statements]
end
RES --> A1
RES --> A2
RES --> A3
RES --> ADOT
RES --> A195
A1 --> LLM
A2 --> LLM
A3 --> LLM
ADOT --> LLM
A195 --> LLM
LLM --> V1
LLM --> V2
LLM --> V3
LLM --> VDOT
LLM --> V195
V1 --> AGG
V2 --> AGG
V3 --> AGG
VDOT --> AGG
V195 --> AGG
style RES fill:#6366f1
style LLM fill:#8b5cf6
style AGG fill:#22c55e
style A1 fill:#f59e0b
style A2 fill:#f59e0b
style A3 fill:#f59e0b
style A195 fill:#f59e0b
System Architecture
graph TB
subgraph Input
M[Motion Text<br/>tasks/motions/]
C[Country List<br/>195 UN Members]
end
subgraph "Agent Processing"
SP[System Prompt<br/>Generic Template]
UP[User Prompt<br/>+ Motion Text]
LLM[Claude 3.5 Sonnet<br/>Temperature: 0.7]
end
subgraph "Output Validation"
JSON[JSON Parser]
V[Schema Validator]
E[Error Handler]
end
subgraph Results
AGG[Aggregated Results]
META[Metadata]
FILE[JSON Output File]
end
M --> UP
C --> SP
SP --> LLM
UP --> LLM
LLM --> JSON
JSON --> V
V --> E
E --> AGG
AGG --> META
META --> FILE
style LLM fill:#6366f1
style JSON fill:#22c55e
style V fill:#f59e0b
style FILE fill:#8b5cf6
Agent Processing Flow
sequenceDiagram
participant CLI as CLI Runner
participant Agent as Country Agent
participant LLM as Claude 3.5
participant Val as Validator
participant Store as Storage
CLI->>Agent: Load system prompt
CLI->>Agent: Send motion text
Agent->>LLM: System + User Prompt
LLM->>Agent: Raw text response
Agent->>Val: Parse JSON
alt Valid JSON
Val->>Val: Check schema
alt Valid Schema
Val->>Store: Save vote + statement
else Invalid Schema
Val->>Store: Save as abstain + error
end
else Invalid JSON
Val->>Store: Save as abstain + error
end
Store->>CLI: Continue to next country
Core Components
1. Agent System Prompts
graph LR
subgraph "Generic Template"
T[Template Structure]
end
subgraph "Variables"
CN[Country Name]
P5[P5 Status]
end
subgraph "195 Agents"
US[United States]
CN2[China]
RU[Russia]
DOT[...]
TV[Tuvalu]
end
T --> CN
T --> P5
CN --> US
CN --> CN2
CN --> RU
CN --> DOT
CN --> TV
style T fill:#6366f1
style US fill:#22c55e
style CN2 fill:#22c55e
style RU fill:#22c55e
style TV fill:#22c55e
- 195 country-specific agents (one per UN member state)
- Generic template structure (identical for all countries)
- Only country name and P5 status differ between prompts
- AI infers policy positions from training data
2. Structured Output Schema
{
"vote": "yes" | "no" | "abstain",
"statement": "Brief explanation (2-4 sentences)"
}
3. Validation Pipeline
graph TD
A[LLM Response] --> B{Valid JSON?}
B -->|Yes| C{Has vote field?}
B -->|No| ERR1[Error: Parse Failure]
C -->|Yes| D{Has statement field?}
C -->|No| ERR2[Error: Missing Vote]
D -->|Yes| E{Vote is yes/no/abstain?}
D -->|No| ERR3[Error: Missing Statement]
E -->|Yes| SUCCESS[Save Response]
E -->|No| ERR4[Error: Invalid Vote]
ERR1 --> DEFAULT[Save as Abstain + Error Flag]
ERR2 --> DEFAULT
ERR3 --> DEFAULT
ERR4 --> DEFAULT
style SUCCESS fill:#22c55e
style DEFAULT fill:#f59e0b
style ERR1 fill:#ef4444
style ERR2 fill:#ef4444
style ERR3 fill:#ef4444
style ERR4 fill:#ef4444
4. Model Configuration
- Primary: Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
- Temperature: 0.7 (balance consistency + variation)
- Max tokens: 800 per response
- Provider: Anthropic API
What This Tests
- LLM Geopolitical Knowledge: How well models understand different countries' foreign policies
- Structured Outputs: Consistency in producing valid JSON under constraints
- Multi-Agent Systems: Coordinating hundreds of independent AI agents
- Prompt Engineering: Generic templates yielding specific behaviors
- Error Handling: Graceful degradation when agents produce invalid outputs
Technical Implementation
Execution Flow
graph TD
START[Start Simulation] --> LOAD_MOTION[Load Motion Text<br/>tasks/motions/motion_id.md]
LOAD_MOTION --> LOAD_COUNTRIES[Load Country List<br/>195 UN Members]
LOAD_COUNTRIES --> LOOP_START{For Each Country}
LOOP_START -->|Country 1-195| LOAD_PROMPT[Load System Prompt<br/>agents/representatives/country/]
LOAD_PROMPT --> BUILD_USER[Build User Prompt<br/>Motion + Instructions]
BUILD_USER --> API_CALL[API Call to Claude<br/>System + User Prompt]
API_CALL --> PARSE[Parse JSON Response]
PARSE --> VALIDATE[Validate Schema]
VALIDATE -->|Valid| STORE[Store Result]
VALIDATE -->|Invalid| ERROR[Store Error + Abstain]
STORE --> LOOP_START
ERROR --> LOOP_START
LOOP_START -->|All Done| AGGREGATE[Aggregate Results]
AGGREGATE --> CALC_STATS[Calculate Vote Summary]
CALC_STATS --> ADD_META[Add Metadata<br/>model, timestamp, etc]
ADD_META --> SAVE_TIME[Save Timestamped File<br/>motion_id_timestamp.json]
SAVE_TIME --> SAVE_LATEST[Save Latest File<br/>motion_id_latest.json]
SAVE_LATEST --> END[Complete]
style API_CALL fill:#6366f1
style VALIDATE fill:#f59e0b
style STORE fill:#22c55e
style ERROR fill:#ef4444
style END fill:#8b5cf6
Command Line Interface
# Run simulation
python scripts/run_motion.py 01_gaza_ceasefire_resolution
# With specific model
python scripts/run_motion.py 01_gaza_ceasefire_resolution --model claude-3-5-sonnet-20241022
# Test with sample
python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5
Output Structure
graph LR
subgraph "JSON Output"
ROOT[Root Object]
META[Metadata]
VOTES[Votes Array]
end
subgraph "Metadata Fields"
ID[motion_id]
TS[timestamp]
MODEL[model]
TOTAL[total_votes]
SUMMARY[vote_summary]
end
subgraph "Vote Summary"
YES[yes: count]
NO[no: count]
ABS[abstain: count]
end
subgraph "Individual Votes"
V1[Vote 1: Country, vote, statement]
V2[Vote 2: Country, vote, statement]
V3[...]
V195[Vote 195: Country, vote, statement]
end
ROOT --> META
ROOT --> VOTES
META --> ID
META --> TS
META --> MODEL
META --> TOTAL
META --> SUMMARY
SUMMARY --> YES
SUMMARY --> NO
SUMMARY --> ABS
VOTES --> V1
VOTES --> V2
VOTES --> V3
VOTES --> V195
style ROOT fill:#8b5cf6
style META fill:#6366f1
style VOTES fill:#22c55e
Case Study: Gaza Ceasefire Resolution
The Space includes a case study demonstrating the system with a Gaza ceasefire resolution voted on by all 195 agents.
Results Overview
pie title Vote Distribution (195 Countries)
"Yes" : 190
"No" : 3
"Abstain" : 2
Key Statistics:
- Yes: 190 countries (97.4%)
- No: 3 countries (1.5%)
- Abstain: 2 countries (1.0%)
This serves as a concrete example of the framework in action, showing how generic prompts + model knowledge produce diverse, country-specific diplomatic responses.
Research Applications
- Testing LLM knowledge of international relations
- Evaluating structured output consistency
- Studying emergent behavior in multi-agent systems
- Educational demonstrations of diplomatic complexity
Limitations
This is a simulation for research and education:
- AI positions based on training data, not actual policies
- Does NOT predict real government decisions
- Should NOT be considered authoritative
- Real diplomacy involves classified information and human judgment
Open Source
All code, prompts, and data available on GitHub:
- Repository: https://github.com/danielrosehill/AI-Agent-UN
- System Prompts: https://github.com/danielrosehill/AI-Agent-UN/tree/main/agents/representatives
- Execution Script: https://github.com/danielrosehill/AI-Agent-UN/blob/main/scripts/run_motion.py
Built with Gradio | Powered by Anthropic Claude
