--- title: "Overgrowth – a living digital environment" emoji: "🌿" colorFrom: "green" colorTo: "gray" sdk: "gradio" sdk_version: "4.44.0" app_file: "app.py" pinned: false tags: - mcp-in-action-track-enterprise --- Video: https://www.youtube.com/watch?v=xY2Hnz2pmDA Social Media Post: https://www.linkedin.com/posts/grahampaasch_overgrowth-a-living-digital-environment-activity-7401000001658765312-Q_cB?utm_source=share&utm_medium=member_desktop&rcm=ACoAABO-OuYBtWdvfyVGnYlWMX4qmiNHrtFV06Q Tag: mcp-in-action-track-enterprise # Overgrowth – MCP Agent for Network Reliability **Track:** MCP in Action – Enterprise **Hackathon:** MCP's 1st Birthday – Hosted by Anthropic & Gradio Overgrowth is a living, digitally persistent network engineer. It simulates an out-of-band (OOB) mesh with MCP tools, JSON state, and Gradio UI to plan, simulate, triage, and roll back network changes safely. ## Submission checklist (for judges + us) - ✅ Track tag present: `mcp-in-action-track-enterprise` - ✅ README links: demo video + social post (replace placeholders below before final submit) - ✅ Demo video recorded (1–5 min) hitting the runbook steps below - ✅ Team handles listed here: `@your-hf-handles` - ✅ Space Secrets: set `OG_LLM_PROVIDER=anthropic` plus `ANTHROPIC_API_KEY` (or OpenAI); Blaxel optional. Add GNS3 env vars only if demoing brownfield. ## Sponsor usage & eligibility - **Blaxel LLM** (preferred; falls back to Anthropic/OpenAI/etc. if set) for Track 2 + Blaxel award - **GNS3 brownfield** import path for enterprise realism - **Modal/Anthropic/OpenAI/Nebius/HF** supported via env keys (already wired in llm_client) What Overgrowth simulates: - Hardware-like OOB mesh via MCP-style tools (backups, drift, bootstrap, alerts) - Persistent state across runs (`infra/*.json`) - Change simulation against the Track 1 MCP server (`network-change-simulator`) - Topology-aware risk and blast radius - Rollback planning and troubleshooting flows --- ## High-Level Architecture [User] | v [Gradio 6 UI: Overgrowth] | v [Agent Pipeline] 1. Parse free-text change request → hybrid steps (NL + JSON) 2. For each step, call MCP `simulate_network_change` (Track 1 server) 3. Blend MCP risk with topology impact (infra/topology.json) 4. Generate config-diff summaries and rollback plan 5. OOB Troubleshooting calls Overgrowth tools (backup, drift, alerts, bootstrap) | v [MCP Servers / Tools] - Track 1: `simulate_network_change` - Overgrowth OOB tools: backups, drift, bootstrap, alerts, root-cause triage --- ## MCP Tools Used - **External MCP server (Track 1):** `network-change-simulator` → `simulate_network_change` - **Overgrowth OOB MCP-style tools (simulated hardware mesh):** - `oob_get_last_backup`, `oob_perform_backup` - `oob_detect_drift`, `oob_reset_drift_flag` - `oob_seed_device` - `oob_get_alerts`, `oob_add_alert` Example Track 1 invocation (conceptual) ``` { "tool_name": "simulate_network_change", "arguments": {"preset_id": "leaf_tor_vlan_stage", "mode": "analysis"} } ``` --- ## User Workflow 1. Describe a change in the main textbox. 2. Click **Run Analysis** to get: - Plan tab: Hybrid NL + JSON steps with MCP-friendly `change_type` + `preset_id` - Risk & Topology: MCP risk plus blast radius from topology.json - Diffs: Conceptual before/after - Rollback: Ordered rollback plan - Tool Calls: Transparent MCP call log 3. Open **OOB Troubleshooting** to triage a device: - Runs backup/drift/alerts/boot checks (persistent JSON state) - Suggests root causes, rollback safety, and next MCP simulations --- ## Configuration ### GNS3 Server Setup (Required for Build Network & Lab Management) The "Build Network" and "Lab Management" features require a running GNS3 server. Configure the server URL using environment variables: ```bash # Copy the example environment file cp .env.example .env # Edit .env and set your GNS3 server URL GNS3_SERVER=http://your-gns3-server:3080 GNS3_PROJECT_NAME=overgrowth ``` **Note:** When running on HuggingFace Spaces, you'll need to configure these as Space secrets. For local development with a local GNS3 server, the defaults will work. --- ## How to Use 1. Open the Space. 2. Enter a change request, e.g. “Add VLAN 120 for tenant Blue on leaf-01 and uplink it to tor-03 (staging only).” 3. Inspect Plan, Risk, Diffs, Rollback, Tool Calls. 4. If there’s an issue, go to **OOB Troubleshooting**, pick a device, describe the symptom, and run triage. --- ## Notes for Judges - Persistent state lives in `infra/*.json` and is updated during user actions (Option C). - OOB tools emulate a hardware-backed OOB mesh while remaining pure MCP-style JSON/state. - Track 1 MCP server is used for real change simulation; Overgrowth augments it with reliability/triage workflows. --- ## Future Extensions - Richer root-cause engine combining MCP results with time-series alerts - Multi-step autonomous remediation runs - Import real config snippets for diffs - CI/CD hooks to gate changes before rollout --- ## What is Overgrowth? - A living MCP-powered OOB engineer that plans changes, simulates risk, detects drift, infers root cause, and self-heals. - Persistence lives in `infra/*.json` so the environment evolves across runs (backups, drift, alerts, bootstrap, MCP history, synapse log). ## Why this matters for enterprise networks - Gives SRE/netops teams a pre-flight simulator + autonomous triage loop without touching production hardware. - Blends MCP tool calls (Track 1 server) with OOB state to deliver safety, transparency, and repeatability. - Produces audit-ready synapse logs, stability scoring, and after-action narratives. ## How Overgrowth works - **MCP Engine:** Calls Track 1 `simulate_network_change` with structured presets from parsed NL input. - **OOB Engine:** Backup/drift/bootstrap/alerts tools operate on persistent JSON state to mimic a hardware OOB fabric. - **Topology Model:** `infra/topology.json` informs blast radius and role-aware amplification. - **Reasoning Layer:** Root-cause inference fuses MCP history, alerts, drift, backup age, and topology to rank causes. - **Recovery Engine:** Executes recommended actions (MCP sims, backup, drift reset, bootstrap, alert clear) and logs everything. ## Autonomous Root-Cause Engine (Aggressive Mode) - Multi-signal fusion (drift + alerts + backups + topology role + last MCP predictions). - Cascading-failure inference (leaf→TOR→core amplification). - Predictive simulation heuristic (reasons about presets without re-running tools). - Recovery sequencing suggestions (rollback, re-bootstrap, re-run MCP full mode, auto-backup). ## Autonomous Recovery Engine (Self-Healing Overgrowth) - Executes recommended actions via MCP-style calls and updates state. - Tracks health with a stability metric (drift/alerts/backup-age/MCP-risk) and reports before/after. - Logs every decision in `infra/synapse_log.json` and displays it in the Synapse Log tab. - Provides after-action narratives and recovery execution summaries directly in the Troubleshooting tab. ## Architecture (ASCII) ``` User → Overgrowth UI → OOB Engine (backup / drift / bootstrap / alerts) → MCP Engine (Track 1 simulate_network_change) → Topology Model (blast radius, roles) → Synapse Log (state & audit trail) ``` ## Demo Script (5 steps) 1. Input a change request (e.g., VLAN add) and click **Run Analysis**. 2. Show MCP risk + topology impact in Plan/Risk tabs. 3. Go to **OOB Troubleshooting**, enter “leaf-01 flap”, run troubleshooting to see root cause + stability + actions. 4. Click **Execute Recovery Plan**; watch recovery summary/log update and stability improve. 5. Open **Synapse Log** to show the audit trail of decisions and health deltas. ## Try it live - Open the Space, follow the demo script above, then hit **Reset Overgrowth State** to rerun deterministically. --- Demo video: https://youtu.be/YOUR_DEMO_VIDEO Social post: https://www.linkedin.com/posts/YOUR_POST or https://x.com/YOUR_HANDLE/status/YOUR_TWEET_ID