Spaces:

MCP-1st-Birthday
/

overgrowth

Running

App Files Files Community

overgrowth / DEPLOYMENT_GUIDE.md

Graham Paasch

docs: Complete deployment guide for Stage 6

c81a736 15 days ago

preview code

raw

history blame contribute delete

18.3 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Stage 6: Autonomous Deployment Guide

Complete guide for deploying configurations to real network devices using Overgrowth's autonomous deployment engine.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                   Deployment Orchestration                       │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │  1. Config Generation (Jinja2 Templates)                   │ │
│  │  2. Pre-Deployment Validation                              │ │
│  │  3. Device Connection (Netmiko/NAPALM)                     │ │
│  │  4. Configuration Deployment                               │ │
│  │  5. Post-Deployment Verification                           │ │
│  │  6. Automatic Rollback (on failure)                        │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
           │                    │                    │
           ▼                    ▼                    ▼
    ┌──────────┐         ┌──────────┐        ┌──────────┐
    │ Cisco    │         │ Arista   │        │ Juniper  │
    │ IOS/NXOS │         │ EOS      │        │ JunOS    │
    └──────────┘         └──────────┘        └──────────┘

Components

1. DeviceDriver (`agent/device_driver.py`)

Manages connections and config deployment to network devices.

Supported Platforms:

Cisco IOS
Cisco NXOS (Nexus)
Cisco IOS-XE (ASR, ISR, etc.)
Arista EOS
Juniper JunOS

Features:

Connection pooling and management
Config backup before deployment
Automatic rollback on failure
Dry-run mode (validate without deploying)
Mock mode (testing without devices)

2. ConfigTemplateEngine (`agent/config_templates.py`)

Generates device configs from Jinja2 templates.

Built-in Templates:

cisco_ios_l2_switch - Layer 2 access switch
cisco_ios_l3_router - Layer 3 router with OSPF/BGP
arista_eos - Arista EOS switch/router
juniper_junos - Juniper router/switch

Features:

Variable substitution from NetworkModel
Custom template support
Template validation
Vendor-specific config generation

3. DeploymentEngine (`agent/deployment_engine.py`)

Orchestrates the entire deployment workflow.

Workflow:

Generate config from template
Connect to device
Run pre-deployment checks
Backup current config
Deploy new config
Run post-deployment checks
Rollback if checks fail
Record deployment history

Usage Examples

Example 1: Deploy Single Device

from agent.deployment_engine import DeploymentEngine, DeploymentTask, DeviceType

# Initialize engine
deployer = DeploymentEngine(use_napalm=True)

# Create deployment task
task = DeploymentTask(
    device_id='core-sw-1',
    device_type=DeviceType.CISCO_IOS,
    hostname='192.168.1.10',
    username='admin',
    password='admin123',
    config="""
hostname core-sw-1
!
vlan 10
 name DATA
vlan 20
 name VOICE
!
interface GigabitEthernet0/1
 switchport mode access
 switchport access vlan 10
 no shutdown
!
""",
    dry_run=False,
    pre_checks=['command:show version'],
    post_checks=['interface:GigabitEthernet0/1']
)

# Deploy
result = deployer.deploy_single_device(task)

print(f"Status: {result.status.value}")
print(f"Duration: {result.duration_seconds:.1f}s")

if result.status.value == 'success':
    print("✓ Deployment successful!")
else:
    print(f"✗ Deployment failed: {result.error}")
    if result.rolled_back:
        print("✓ Configuration rolled back")

Example 2: Generate and Deploy from Template

from agent.deployment_engine import DeploymentEngine
from agent.pipeline_engine import NetworkModel, Device, NetworkIntent

# Create network model
model = NetworkModel(
    name="campus-network",
    version="1.0",
    intent=NetworkIntent(
        description="Campus network deployment",
        business_requirements=["High availability", "VLAN segmentation"],
        constraints=["Budget friendly"]
    ),
    devices=[
        Device(
            name="access-sw-1",
            role="access",
            model="Catalyst 2960",
            vendor="Cisco",
            mgmt_ip="192.168.1.20",
            location="Building A",
            interfaces=[
                {
                    "name": "GigabitEthernet0/1",
                    "description": "Uplink to core",
                    "mode": "trunk",
                    "enabled": True
                },
                {
                    "name": "GigabitEthernet0/2",
                    "description": "Workstation port",
                    "mode": "access",
                    "vlan": 10,
                    "enabled": True
                }
            ]
        )
    ],
    vlans=[
        {"id": 10, "name": "DATA"},
        {"id": 20, "name": "VOICE"},
        {"id": 99, "name": "MANAGEMENT"}
    ],
    subnets=[
        {"network": "10.0.10.0/24", "vlan": 10},
        {"network": "10.0.20.0/24", "vlan": 20}
    ],
    routing={},
    services=["DHCP", "NTP"]
)

# Initialize deployer
deployer = DeploymentEngine(use_napalm=True)

# Network context for templates
network_context = {
    'vlans': model.vlans,
    'routing': model.routing,
    'domain_name': 'campus.local',
    'ntp_servers': ['0.pool.ntp.org'],
    'dns_servers': ['8.8.8.8']
}

# Credentials
credentials = {
    'username': 'admin',
    'password': 'secure123'
}

# Deploy each device
for device in model.devices:
    result = deployer.generate_and_deploy(
        device=device,
        network_context=network_context,
        credentials=credentials,
        dry_run=False,
        pre_checks=['command:show version'],
        post_checks=['command:show running-config']
    )
    
    print(f"{device.name}: {result.status.value}")

Example 3: Dry-Run Mode (Test Without Deploying)

from agent.pipeline_engine import OvergrowthPipeline

pipeline = OvergrowthPipeline()

# Generate network model
intent = pipeline.stage1_consultation("Deploy 3-tier campus network")
model = pipeline.stage2_generate_sot(intent)

# Dry-run deployment (generates configs, validates, but doesn't deploy)
results = pipeline.stage6_autonomous_deploy(
    model=model,
    credentials={'username': 'admin', 'password': 'admin'},
    dry_run=True,  # No actual changes to devices
    parallel=False
)

print(f"Dry-run complete: {results['successful']}/{results['total_devices']} would succeed")

for r in results['results']:
    print(f"  {r['device_id']}: {r['status']}")
    if r['status'] == 'failed':
        print(f"    Error: {r['error']}")

Example 4: Parallel Deployment with Ray

from agent.pipeline_engine import OvergrowthPipeline

pipeline = OvergrowthPipeline()
model = pipeline.stage2_generate_sot(intent)

# Enable parallel mode
pipeline.enable_parallel_mode()

# Deploy to all devices in parallel
results = pipeline.stage6_autonomous_deploy(
    model=model,
    credentials={'username': 'admin', 'password': 'admin'},
    dry_run=False,
    parallel=True  # Use Ray for concurrent deployment
)

print(f"Deployed to {results['successful']}/{results['total_devices']} devices")
print(f"Success rate: {results['success_rate']:.1f}%")
print(f"Rolled back: {results['rolled_back']}")

Example 5: Custom Pre/Post Validation Checks

from agent.deployment_engine import DeploymentEngine, DeploymentTask, DeviceType

deployer = DeploymentEngine()

task = DeploymentTask(
    device_id='border-rtr-1',
    device_type=DeviceType.CISCO_XE,
    hostname='10.0.0.1',
    username='admin',
    password='admin',
    config=router_config,
    dry_run=False,
    # Pre-deployment checks
    pre_checks=[
        'command:show version',
        'command:show ip interface brief',
        'ping:8.8.8.8',  # Check internet connectivity
    ],
    # Post-deployment checks
    post_checks=[
        'interface:GigabitEthernet0/0',  # Verify interface up
        'ping:10.0.1.1',  # Verify internal connectivity
        'command:show ip bgp summary',  # Verify BGP
    ]
)

result = deployer.deploy_single_device(task)

# Check which validations passed/failed
print("Pre-checks:", result.pre_check_results)
print("Post-checks:", result.post_check_results)

Configuration Templates

Cisco IOS L2 Switch Template

Located in config_templates.py as CISCO_IOS_L2_SWITCH_TEMPLATE.

Variables:

device.name - Hostname
device.mgmt_ip - Management IP
vlans - List of VLAN dicts (id, name)
device.interfaces - List of interface dicts
default_gateway - Default gateway IP
ntp_servers - List of NTP server IPs
dns_servers - List of DNS server IPs

Example:

from agent.config_templates import generate_cisco_ios_config

config = generate_cisco_ios_config(
    device=my_device,
    vlans=[
        {"id": 10, "name": "DATA"},
        {"id": 20, "name": "VOICE"}
    ],
    ntp_servers=['0.pool.ntp.org'],
    dns_servers=['8.8.8.8'],
    default_gateway='192.168.1.1'
)

Cisco IOS L3 Router Template

Includes routing protocols (OSPF, BGP, static routes).

Variables:

All L2 variables plus:
routing.protocol - 'ospf', 'bgp', or 'static'
routing.process_id - OSPF process ID
routing.networks - List of networks to advertise
routing.asn - BGP AS number
routing.neighbors - List of BGP neighbor dicts

Example:

config = generate_cisco_ios_config(
    device=router_device,
    vlans=[],
    routing={
        'protocol': 'ospf',
        'process_id': 1,
        'networks': ['10.0.0.0 0.0.255.255'],
        'area': 0
    }
)

Custom Templates

Create custom Jinja2 templates:

from agent.config_templates import ConfigTemplateEngine

engine = ConfigTemplateEngine()

# Add custom template
custom_template = """
hostname {{ device.name }}
!
{% for vlan in vlans %}
vlan {{ vlan.id }}
 name {{ vlan.name }}
{% endfor %}
!
"""

engine.add_custom_template('my_custom_template', custom_template)

# Use it
config = engine.render_template('my_custom_template', {
    'device': {'name': 'my-switch'},
    'vlans': [{'id': 10, 'name': 'DATA'}]
})

Validation Checks

Check Types

Command checks:

'command:show version'  # Run command, pass if no error

Ping checks:

'ping:8.8.8.8'  # Ping target, pass if successful

Interface checks:

'interface:GigabitEthernet0/1'  # Check interface status, pass if up

Check Timing

Pre-checks: Run before config deployment
- Verify device accessible
- Check current state
- Validate prerequisites
Post-checks: Run after config deployment
- Verify config applied
- Test connectivity
- Validate services

Error Handling & Rollback

Automatic Rollback

If post-deployment checks fail, the engine automatically rolls back:

Detect check failure
Log error details
Deploy previous config (from backup)
Mark deployment as ROLLED_BACK

result = deployer.deploy_single_device(task)

if result.rolled_back:
    print(f"Deployment failed and was rolled back")
    print(f"Reason: {result.error}")
    print(f"Config restored to: {result.config_before[:100]}...")

Manual Rollback

from agent.device_driver import DeviceDriver, DeviceCredentials, DeviceType

driver = DeviceDriver()

# Connect
creds = DeviceCredentials(
    hostname='192.168.1.10',
    username='admin',
    password='admin',
    device_type=DeviceType.CISCO_IOS
)

conn = driver.connect(creds)

# Get current config
backup = driver.get_config('192.168.1.10', 'running')

# ... something goes wrong ...

# Rollback
driver.rollback_config('192.168.1.10', backup)

Multi-Vendor Support

Cisco IOS/IOS-XE

from agent.device_driver import DeviceType

# Cisco Catalyst, ISR, ASR
device_type = DeviceType.CISCO_IOS  # or CISCO_XE

Supported Features:

Config merge and replace
Running/startup config backup
Auto-save on Cisco IOS

Cisco NXOS (Nexus)

device_type = DeviceType.CISCO_NXOS

Features:

Checkpoint/rollback support
Config replace via NAPALM

Arista EOS

device_type = DeviceType.ARISTA_EOS

Features:

Config sessions
Atomic commits
Fast boot times

Juniper JunOS

device_type = DeviceType.JUNIPER_JUNOS

Features:

Candidate config
Commit confirmed
Rollback points

Testing Without Devices (Mock Mode)

All components support mock mode for development/testing:

from agent.device_driver import DeviceDriver

# Initialize in mock mode (auto-detected if Netmiko/NAPALM not installed)
driver = DeviceDriver()

print(f"Mock mode: {driver.mock_mode}")  # True if no libraries

# Mock connections always succeed
conn = driver.connect(credentials)
print(f"Connected: {conn.status}")  # CONNECTED

# Mock deployments simulate success
result = driver.deploy_config('device-1', config)
print(f"Deployed: {result.success}")  # True

Deployment History & Metrics

from agent.deployment_engine import DeploymentEngine

deployer = DeploymentEngine()

# ... deploy devices ...

# Get summary
summary = deployer.get_deployment_summary()

print(f"Total deployments: {summary['total_deployments']}")
print(f"Success rate: {summary['success_rate']:.1f}%")
print(f"Average duration: {summary['avg_duration']:.1f}s")

# Recent deployments
for dep in summary['latest_deployments']:
    print(f"{dep['device_id']}: {dep['status']} ({dep['duration']:.1f}s)")

Troubleshooting

Connection Failures

Symptom: Failed to connect: timeout

Solutions:

# Increase timeout
credentials = DeviceCredentials(
    hostname='192.168.1.10',
    username='admin',
    password='admin',
    device_type=DeviceType.CISCO_IOS,
    timeout=60  # Increase from default 30s
)

# Check network connectivity
driver.verify_connectivity('device-id')

# Enable session logging for debugging
credentials.session_log = '/tmp/device-session.log'

Authentication Failures

Symptom: Failed to connect: authentication failed

Solutions:

# For devices requiring enable password
credentials = DeviceCredentials(
    hostname='192.168.1.10',
    username='admin',
    password='admin',
    secret='enable_password',  # Enable secret
    device_type=DeviceType.CISCO_IOS
)

Config Deployment Failures

Symptom: Deployment failed: command error

Solutions:

# Use dry-run to validate first
result = deployer.deploy_config(
    device_id='device-1',
    config=config,
    dry_run=True  # Test without applying
)

print(f"Would work: {result.success}")

# Check diff before deploying
if result.output:
    print(f"Changes:\n{result.output}")

Post-Check Failures

Symptom: Post-deployment check failed: ping:8.8.8.8

Solutions:

# Add delay before post-checks
import time
time.sleep(5)  # Wait for config to take effect

# Use more specific checks
post_checks=[
    'command:show ip interface brief',  # More specific than ping
    'interface:GigabitEthernet0/1'
]

# Disable rollback for troubleshooting
# (manually verify and fix)

Production Best Practices

1. Always Use Dry-Run First

# Test deployment
dry_result = pipeline.stage6_autonomous_deploy(model, dry_run=True)

# Review results
if dry_result['success_rate'] == 100.0:
    # Now deploy for real
    real_result = pipeline.stage6_autonomous_deploy(model, dry_run=False)

2. Use Pre-Flight Validation

# Run Stage 0 validation before deployment
preflight = pipeline.stage0_preflight(model)

if not preflight['ready_to_deploy']:
    print("Pre-flight failed - aborting")
    print(f"Errors: {preflight['errors']}")
    exit(1)

# Deploy only after validation passes
pipeline.stage6_autonomous_deploy(model)

3. Implement Change Windows

from datetime import datetime, time as dt_time

def in_change_window():
    """Check if current time is in approved change window"""
    now = datetime.now()
    # Only deploy between 2 AM - 4 AM
    return dt_time(2, 0) <= now.time() <= dt_time(4, 0)

if not in_change_window():
    print("Outside change window - aborting")
    exit(1)

# Deploy during approved window
pipeline.stage6_autonomous_deploy(model)

4. Use Parallel Deployment Carefully

# Start with small batch
results = pipeline.parallel_deploy_fleet(
    model=model,
    staggered=True,
    stages=[0.01, 0.05, 0.1, 1.0]  # 1%, 5%, 10%, 100%
)

# Circuit breaker stops on high failure rate

5. Maintain Deployment Audit Trail

deployer = DeploymentEngine()

# Deploy
result = deployer.deploy_single_device(task)

# Log to external system
import json
with open(f'/var/log/deployments/{result.device_id}.json', 'w') as f:
    json.dump({
        'device_id': result.device_id,
        'status': result.status.value,
        'timestamp': result.timestamp.isoformat(),
        'config_before': result.config_before,
        'config_after': result.config_after,
        'deployed_by': os.environ.get('USER'),
        'duration': result.duration_seconds
    }, f, indent=2)

Next Steps

✅ Multi-vendor device support
✅ Config templating
✅ Pre/post validation
✅ Automatic rollback
🚧 Full Ray parallel deployment integration
🚧 Advanced validation (pyATS test cases)
🚧 Change request workflow
🚧 Approval gates for production

You're now ready to deploy to real network devices! 🚀

Stage 6: Autonomous Deployment Guide

Architecture

Components

1. DeviceDriver (agent/device_driver.py)

2. ConfigTemplateEngine (agent/config_templates.py)

3. DeploymentEngine (agent/deployment_engine.py)

Usage Examples

Example 1: Deploy Single Device

Example 2: Generate and Deploy from Template

Example 3: Dry-Run Mode (Test Without Deploying)

Example 4: Parallel Deployment with Ray

Example 5: Custom Pre/Post Validation Checks

Configuration Templates

Cisco IOS L2 Switch Template

Cisco IOS L3 Router Template

Custom Templates

Validation Checks

Check Types

Check Timing

Error Handling & Rollback

Automatic Rollback

Manual Rollback

Multi-Vendor Support

Cisco IOS/IOS-XE

Cisco NXOS (Nexus)

Arista EOS

Juniper JunOS

Testing Without Devices (Mock Mode)

Deployment History & Metrics

Troubleshooting

Connection Failures

Authentication Failures

Config Deployment Failures

Post-Check Failures

Production Best Practices

1. Always Use Dry-Run First

2. Use Pre-Flight Validation

3. Implement Change Windows

4. Use Parallel Deployment Carefully

5. Maintain Deployment Audit Trail

Next Steps

1. DeviceDriver (`agent/device_driver.py`)

2. ConfigTemplateEngine (`agent/config_templates.py`)

3. DeploymentEngine (`agent/deployment_engine.py`)