Spaces:

danielrosehill
/

Agent-UN

Sleeping

App Files Files Community

Agent-UN / README.md

danielrosehill

Add banner image to README

24d65a0 2 months ago

preview code

raw

history blame contribute delete

9.95 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: AI Agent UN - Multi-Agent Simulation Framework
emoji: 🏛️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit

AI Agent United Nations: Multi-Agent Simulation Framework

A structured system for simulating international diplomatic decision-making using 195 AI agents with constrained JSON outputs.

System Overview

This is an experimental framework demonstrating:

Multi-agent coordination across 195 independent AI agents
Structured output constraints with strict JSON schema validation
Generic prompt templates producing country-specific behaviors
Task execution model for running resolutions through all agents

High-Level Concept

graph TB
    subgraph "Input Layer"
        RES[UN Resolution Text]
    end

    subgraph "Agent Layer - 195 Independent Agents"
        A1[Agent: USA<br/>System Prompt]
        A2[Agent: China<br/>System Prompt]
        A3[Agent: Russia<br/>System Prompt]
        ADOT[...]
        A195[Agent: Tuvalu<br/>System Prompt]
    end

    subgraph "LLM Processing"
        LLM[Claude 3.5 Sonnet<br/>Structured JSON Output]
    end

    subgraph "Output Layer"
        V1[Vote: yes<br/>Statement: ...]
        V2[Vote: no<br/>Statement: ...]
        V3[Vote: yes<br/>Statement: ...]
        VDOT[...]
        V195[Vote: yes<br/>Statement: ...]
    end

    subgraph "Aggregation"
        AGG[Combined Results<br/>Vote Counts + All Statements]
    end

    RES --> A1
    RES --> A2
    RES --> A3
    RES --> ADOT
    RES --> A195

    A1 --> LLM
    A2 --> LLM
    A3 --> LLM
    ADOT --> LLM
    A195 --> LLM

    LLM --> V1
    LLM --> V2
    LLM --> V3
    LLM --> VDOT
    LLM --> V195

    V1 --> AGG
    V2 --> AGG
    V3 --> AGG
    VDOT --> AGG
    V195 --> AGG

    style RES fill:#6366f1
    style LLM fill:#8b5cf6
    style AGG fill:#22c55e
    style A1 fill:#f59e0b
    style A2 fill:#f59e0b
    style A3 fill:#f59e0b
    style A195 fill:#f59e0b

System Architecture

graph TB
    subgraph Input
        M[Motion Text<br/>tasks/motions/]
        C[Country List<br/>195 UN Members]
    end

    subgraph "Agent Processing"
        SP[System Prompt<br/>Generic Template]
        UP[User Prompt<br/>+ Motion Text]
        LLM[Claude 3.5 Sonnet<br/>Temperature: 0.7]
    end

    subgraph "Output Validation"
        JSON[JSON Parser]
        V[Schema Validator]
        E[Error Handler]
    end

    subgraph Results
        AGG[Aggregated Results]
        META[Metadata]
        FILE[JSON Output File]
    end

    M --> UP
    C --> SP
    SP --> LLM
    UP --> LLM
    LLM --> JSON
    JSON --> V
    V --> E
    E --> AGG
    AGG --> META
    META --> FILE

    style LLM fill:#6366f1
    style JSON fill:#22c55e
    style V fill:#f59e0b
    style FILE fill:#8b5cf6

Agent Processing Flow

sequenceDiagram
    participant CLI as CLI Runner
    participant Agent as Country Agent
    participant LLM as Claude 3.5
    participant Val as Validator
    participant Store as Storage

    CLI->>Agent: Load system prompt
    CLI->>Agent: Send motion text
    Agent->>LLM: System + User Prompt
    LLM->>Agent: Raw text response
    Agent->>Val: Parse JSON
    alt Valid JSON
        Val->>Val: Check schema
        alt Valid Schema
            Val->>Store: Save vote + statement
        else Invalid Schema
            Val->>Store: Save as abstain + error
        end
    else Invalid JSON
        Val->>Store: Save as abstain + error
    end
    Store->>CLI: Continue to next country

Core Components

1. Agent System Prompts

graph LR
    subgraph "Generic Template"
        T[Template Structure]
    end

    subgraph "Variables"
        CN[Country Name]
        P5[P5 Status]
    end

    subgraph "195 Agents"
        US[United States]
        CN2[China]
        RU[Russia]
        DOT[...]
        TV[Tuvalu]
    end

    T --> CN
    T --> P5
    CN --> US
    CN --> CN2
    CN --> RU
    CN --> DOT
    CN --> TV

    style T fill:#6366f1
    style US fill:#22c55e
    style CN2 fill:#22c55e
    style RU fill:#22c55e
    style TV fill:#22c55e

195 country-specific agents (one per UN member state)
Generic template structure (identical for all countries)
Only country name and P5 status differ between prompts
AI infers policy positions from training data

2. Structured Output Schema

{
  "vote": "yes" | "no" | "abstain",
  "statement": "Brief explanation (2-4 sentences)"
}

3. Validation Pipeline

graph TD
    A[LLM Response] --> B{Valid JSON?}
    B -->|Yes| C{Has vote field?}
    B -->|No| ERR1[Error: Parse Failure]
    C -->|Yes| D{Has statement field?}
    C -->|No| ERR2[Error: Missing Vote]
    D -->|Yes| E{Vote is yes/no/abstain?}
    D -->|No| ERR3[Error: Missing Statement]
    E -->|Yes| SUCCESS[Save Response]
    E -->|No| ERR4[Error: Invalid Vote]

    ERR1 --> DEFAULT[Save as Abstain + Error Flag]
    ERR2 --> DEFAULT
    ERR3 --> DEFAULT
    ERR4 --> DEFAULT

    style SUCCESS fill:#22c55e
    style DEFAULT fill:#f59e0b
    style ERR1 fill:#ef4444
    style ERR2 fill:#ef4444
    style ERR3 fill:#ef4444
    style ERR4 fill:#ef4444

4. Model Configuration

Primary: Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
Temperature: 0.7 (balance consistency + variation)
Max tokens: 800 per response
Provider: Anthropic API

What This Tests

LLM Geopolitical Knowledge: How well models understand different countries' foreign policies
Structured Outputs: Consistency in producing valid JSON under constraints
Multi-Agent Systems: Coordinating hundreds of independent AI agents
Prompt Engineering: Generic templates yielding specific behaviors
Error Handling: Graceful degradation when agents produce invalid outputs

Technical Implementation

Execution Flow

graph TD
    START[Start Simulation] --> LOAD_MOTION[Load Motion Text<br/>tasks/motions/motion_id.md]
    LOAD_MOTION --> LOAD_COUNTRIES[Load Country List<br/>195 UN Members]
    LOAD_COUNTRIES --> LOOP_START{For Each Country}

    LOOP_START -->|Country 1-195| LOAD_PROMPT[Load System Prompt<br/>agents/representatives/country/]
    LOAD_PROMPT --> BUILD_USER[Build User Prompt<br/>Motion + Instructions]
    BUILD_USER --> API_CALL[API Call to Claude<br/>System + User Prompt]
    API_CALL --> PARSE[Parse JSON Response]
    PARSE --> VALIDATE[Validate Schema]
    VALIDATE -->|Valid| STORE[Store Result]
    VALIDATE -->|Invalid| ERROR[Store Error + Abstain]
    STORE --> LOOP_START
    ERROR --> LOOP_START

    LOOP_START -->|All Done| AGGREGATE[Aggregate Results]
    AGGREGATE --> CALC_STATS[Calculate Vote Summary]
    CALC_STATS --> ADD_META[Add Metadata<br/>model, timestamp, etc]
    ADD_META --> SAVE_TIME[Save Timestamped File<br/>motion_id_timestamp.json]
    SAVE_TIME --> SAVE_LATEST[Save Latest File<br/>motion_id_latest.json]
    SAVE_LATEST --> END[Complete]

    style API_CALL fill:#6366f1
    style VALIDATE fill:#f59e0b
    style STORE fill:#22c55e
    style ERROR fill:#ef4444
    style END fill:#8b5cf6

Command Line Interface

# Run simulation
python scripts/run_motion.py 01_gaza_ceasefire_resolution

# With specific model
python scripts/run_motion.py 01_gaza_ceasefire_resolution --model claude-3-5-sonnet-20241022

# Test with sample
python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5

Output Structure

graph LR
    subgraph "JSON Output"
        ROOT[Root Object]
        META[Metadata]
        VOTES[Votes Array]
    end

    subgraph "Metadata Fields"
        ID[motion_id]
        TS[timestamp]
        MODEL[model]
        TOTAL[total_votes]
        SUMMARY[vote_summary]
    end

    subgraph "Vote Summary"
        YES[yes: count]
        NO[no: count]
        ABS[abstain: count]
    end

    subgraph "Individual Votes"
        V1[Vote 1: Country, vote, statement]
        V2[Vote 2: Country, vote, statement]
        V3[...]
        V195[Vote 195: Country, vote, statement]
    end

    ROOT --> META
    ROOT --> VOTES
    META --> ID
    META --> TS
    META --> MODEL
    META --> TOTAL
    META --> SUMMARY
    SUMMARY --> YES
    SUMMARY --> NO
    SUMMARY --> ABS
    VOTES --> V1
    VOTES --> V2
    VOTES --> V3
    VOTES --> V195

    style ROOT fill:#8b5cf6
    style META fill:#6366f1
    style VOTES fill:#22c55e

Case Study: Gaza Ceasefire Resolution

The Space includes a case study demonstrating the system with a Gaza ceasefire resolution voted on by all 195 agents.

Results Overview

pie title Vote Distribution (195 Countries)
    "Yes" : 190
    "No" : 3
    "Abstain" : 2

Key Statistics:

Yes: 190 countries (97.4%)
No: 3 countries (1.5%)
Abstain: 2 countries (1.0%)

This serves as a concrete example of the framework in action, showing how generic prompts + model knowledge produce diverse, country-specific diplomatic responses.

Research Applications

Testing LLM knowledge of international relations
Evaluating structured output consistency
Studying emergent behavior in multi-agent systems
Educational demonstrations of diplomatic complexity

Limitations

This is a simulation for research and education:

AI positions based on training data, not actual policies
Does NOT predict real government decisions
Should NOT be considered authoritative
Real diplomacy involves classified information and human judgment

Open Source

All code, prompts, and data available on GitHub:

Repository: https://github.com/danielrosehill/AI-Agent-UN
System Prompts: https://github.com/danielrosehill/AI-Agent-UN/tree/main/agents/representatives
Execution Script: https://github.com/danielrosehill/AI-Agent-UN/blob/main/scripts/run_motion.py