Agent-UN / README.md
danielrosehill's picture
Add banner image to README
24d65a0

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: AI Agent UN - Multi-Agent Simulation Framework
emoji: 🏛️
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit

AI Agent UN Banner

AI Agent United Nations: Multi-Agent Simulation Framework

A structured system for simulating international diplomatic decision-making using 195 AI agents with constrained JSON outputs.

System Overview

This is an experimental framework demonstrating:

  • Multi-agent coordination across 195 independent AI agents
  • Structured output constraints with strict JSON schema validation
  • Generic prompt templates producing country-specific behaviors
  • Task execution model for running resolutions through all agents

High-Level Concept

graph TB
    subgraph "Input Layer"
        RES[UN Resolution Text]
    end

    subgraph "Agent Layer - 195 Independent Agents"
        A1[Agent: USA<br/>System Prompt]
        A2[Agent: China<br/>System Prompt]
        A3[Agent: Russia<br/>System Prompt]
        ADOT[...]
        A195[Agent: Tuvalu<br/>System Prompt]
    end

    subgraph "LLM Processing"
        LLM[Claude 3.5 Sonnet<br/>Structured JSON Output]
    end

    subgraph "Output Layer"
        V1[Vote: yes<br/>Statement: ...]
        V2[Vote: no<br/>Statement: ...]
        V3[Vote: yes<br/>Statement: ...]
        VDOT[...]
        V195[Vote: yes<br/>Statement: ...]
    end

    subgraph "Aggregation"
        AGG[Combined Results<br/>Vote Counts + All Statements]
    end

    RES --> A1
    RES --> A2
    RES --> A3
    RES --> ADOT
    RES --> A195

    A1 --> LLM
    A2 --> LLM
    A3 --> LLM
    ADOT --> LLM
    A195 --> LLM

    LLM --> V1
    LLM --> V2
    LLM --> V3
    LLM --> VDOT
    LLM --> V195

    V1 --> AGG
    V2 --> AGG
    V3 --> AGG
    VDOT --> AGG
    V195 --> AGG

    style RES fill:#6366f1
    style LLM fill:#8b5cf6
    style AGG fill:#22c55e
    style A1 fill:#f59e0b
    style A2 fill:#f59e0b
    style A3 fill:#f59e0b
    style A195 fill:#f59e0b

System Architecture

graph TB
    subgraph Input
        M[Motion Text<br/>tasks/motions/]
        C[Country List<br/>195 UN Members]
    end

    subgraph "Agent Processing"
        SP[System Prompt<br/>Generic Template]
        UP[User Prompt<br/>+ Motion Text]
        LLM[Claude 3.5 Sonnet<br/>Temperature: 0.7]
    end

    subgraph "Output Validation"
        JSON[JSON Parser]
        V[Schema Validator]
        E[Error Handler]
    end

    subgraph Results
        AGG[Aggregated Results]
        META[Metadata]
        FILE[JSON Output File]
    end

    M --> UP
    C --> SP
    SP --> LLM
    UP --> LLM
    LLM --> JSON
    JSON --> V
    V --> E
    E --> AGG
    AGG --> META
    META --> FILE

    style LLM fill:#6366f1
    style JSON fill:#22c55e
    style V fill:#f59e0b
    style FILE fill:#8b5cf6

Agent Processing Flow

sequenceDiagram
    participant CLI as CLI Runner
    participant Agent as Country Agent
    participant LLM as Claude 3.5
    participant Val as Validator
    participant Store as Storage

    CLI->>Agent: Load system prompt
    CLI->>Agent: Send motion text
    Agent->>LLM: System + User Prompt
    LLM->>Agent: Raw text response
    Agent->>Val: Parse JSON
    alt Valid JSON
        Val->>Val: Check schema
        alt Valid Schema
            Val->>Store: Save vote + statement
        else Invalid Schema
            Val->>Store: Save as abstain + error
        end
    else Invalid JSON
        Val->>Store: Save as abstain + error
    end
    Store->>CLI: Continue to next country

Core Components

1. Agent System Prompts

graph LR
    subgraph "Generic Template"
        T[Template Structure]
    end

    subgraph "Variables"
        CN[Country Name]
        P5[P5 Status]
    end

    subgraph "195 Agents"
        US[United States]
        CN2[China]
        RU[Russia]
        DOT[...]
        TV[Tuvalu]
    end

    T --> CN
    T --> P5
    CN --> US
    CN --> CN2
    CN --> RU
    CN --> DOT
    CN --> TV

    style T fill:#6366f1
    style US fill:#22c55e
    style CN2 fill:#22c55e
    style RU fill:#22c55e
    style TV fill:#22c55e
  • 195 country-specific agents (one per UN member state)
  • Generic template structure (identical for all countries)
  • Only country name and P5 status differ between prompts
  • AI infers policy positions from training data

2. Structured Output Schema

{
  "vote": "yes" | "no" | "abstain",
  "statement": "Brief explanation (2-4 sentences)"
}

3. Validation Pipeline

graph TD
    A[LLM Response] --> B{Valid JSON?}
    B -->|Yes| C{Has vote field?}
    B -->|No| ERR1[Error: Parse Failure]
    C -->|Yes| D{Has statement field?}
    C -->|No| ERR2[Error: Missing Vote]
    D -->|Yes| E{Vote is yes/no/abstain?}
    D -->|No| ERR3[Error: Missing Statement]
    E -->|Yes| SUCCESS[Save Response]
    E -->|No| ERR4[Error: Invalid Vote]

    ERR1 --> DEFAULT[Save as Abstain + Error Flag]
    ERR2 --> DEFAULT
    ERR3 --> DEFAULT
    ERR4 --> DEFAULT

    style SUCCESS fill:#22c55e
    style DEFAULT fill:#f59e0b
    style ERR1 fill:#ef4444
    style ERR2 fill:#ef4444
    style ERR3 fill:#ef4444
    style ERR4 fill:#ef4444

4. Model Configuration

  • Primary: Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)
  • Temperature: 0.7 (balance consistency + variation)
  • Max tokens: 800 per response
  • Provider: Anthropic API

What This Tests

  • LLM Geopolitical Knowledge: How well models understand different countries' foreign policies
  • Structured Outputs: Consistency in producing valid JSON under constraints
  • Multi-Agent Systems: Coordinating hundreds of independent AI agents
  • Prompt Engineering: Generic templates yielding specific behaviors
  • Error Handling: Graceful degradation when agents produce invalid outputs

Technical Implementation

Execution Flow

graph TD
    START[Start Simulation] --> LOAD_MOTION[Load Motion Text<br/>tasks/motions/motion_id.md]
    LOAD_MOTION --> LOAD_COUNTRIES[Load Country List<br/>195 UN Members]
    LOAD_COUNTRIES --> LOOP_START{For Each Country}

    LOOP_START -->|Country 1-195| LOAD_PROMPT[Load System Prompt<br/>agents/representatives/country/]
    LOAD_PROMPT --> BUILD_USER[Build User Prompt<br/>Motion + Instructions]
    BUILD_USER --> API_CALL[API Call to Claude<br/>System + User Prompt]
    API_CALL --> PARSE[Parse JSON Response]
    PARSE --> VALIDATE[Validate Schema]
    VALIDATE -->|Valid| STORE[Store Result]
    VALIDATE -->|Invalid| ERROR[Store Error + Abstain]
    STORE --> LOOP_START
    ERROR --> LOOP_START

    LOOP_START -->|All Done| AGGREGATE[Aggregate Results]
    AGGREGATE --> CALC_STATS[Calculate Vote Summary]
    CALC_STATS --> ADD_META[Add Metadata<br/>model, timestamp, etc]
    ADD_META --> SAVE_TIME[Save Timestamped File<br/>motion_id_timestamp.json]
    SAVE_TIME --> SAVE_LATEST[Save Latest File<br/>motion_id_latest.json]
    SAVE_LATEST --> END[Complete]

    style API_CALL fill:#6366f1
    style VALIDATE fill:#f59e0b
    style STORE fill:#22c55e
    style ERROR fill:#ef4444
    style END fill:#8b5cf6

Command Line Interface

# Run simulation
python scripts/run_motion.py 01_gaza_ceasefire_resolution

# With specific model
python scripts/run_motion.py 01_gaza_ceasefire_resolution --model claude-3-5-sonnet-20241022

# Test with sample
python scripts/run_motion.py 01_gaza_ceasefire_resolution --sample 5

Output Structure

graph LR
    subgraph "JSON Output"
        ROOT[Root Object]
        META[Metadata]
        VOTES[Votes Array]
    end

    subgraph "Metadata Fields"
        ID[motion_id]
        TS[timestamp]
        MODEL[model]
        TOTAL[total_votes]
        SUMMARY[vote_summary]
    end

    subgraph "Vote Summary"
        YES[yes: count]
        NO[no: count]
        ABS[abstain: count]
    end

    subgraph "Individual Votes"
        V1[Vote 1: Country, vote, statement]
        V2[Vote 2: Country, vote, statement]
        V3[...]
        V195[Vote 195: Country, vote, statement]
    end

    ROOT --> META
    ROOT --> VOTES
    META --> ID
    META --> TS
    META --> MODEL
    META --> TOTAL
    META --> SUMMARY
    SUMMARY --> YES
    SUMMARY --> NO
    SUMMARY --> ABS
    VOTES --> V1
    VOTES --> V2
    VOTES --> V3
    VOTES --> V195

    style ROOT fill:#8b5cf6
    style META fill:#6366f1
    style VOTES fill:#22c55e

Case Study: Gaza Ceasefire Resolution

The Space includes a case study demonstrating the system with a Gaza ceasefire resolution voted on by all 195 agents.

Results Overview

pie title Vote Distribution (195 Countries)
    "Yes" : 190
    "No" : 3
    "Abstain" : 2

Key Statistics:

  • Yes: 190 countries (97.4%)
  • No: 3 countries (1.5%)
  • Abstain: 2 countries (1.0%)

This serves as a concrete example of the framework in action, showing how generic prompts + model knowledge produce diverse, country-specific diplomatic responses.

Research Applications

  • Testing LLM knowledge of international relations
  • Evaluating structured output consistency
  • Studying emergent behavior in multi-agent systems
  • Educational demonstrations of diplomatic complexity

Limitations

This is a simulation for research and education:

  • AI positions based on training data, not actual policies
  • Does NOT predict real government decisions
  • Should NOT be considered authoritative
  • Real diplomacy involves classified information and human judgment

Open Source

All code, prompts, and data available on GitHub:


Built with Gradio | Powered by Anthropic Claude