Spaces:
Running
A newer version of the Gradio SDK is available:
6.1.0
SuzieQ Integration - Multi-Vendor Drift Detection
Overview
SuzieQ is an open-source network observability framework that provides multi-vendor state collection, topology discovery, and historical analysis. Overgrowth integrates SuzieQ for continuous drift detection - comparing actual network state against intended state (NetBox SoT) and automatically generating remediation plans.
What is Configuration Drift?
Configuration drift occurs when the actual network state diverges from the intended state defined in your source of truth (SoT). Common causes:
- Manual changes made directly on devices
- Failed automation runs leaving partial configs
- Hardware failures requiring emergency workarounds
- Shadow IT adding unauthorized VLANs/subnets
- Config erosion over time
Why SuzieQ?
| Feature | SuzieQ | Traditional Monitoring |
|---|---|---|
| Multi-vendor | β Arista, Cisco, Juniper, Cumulus, etc. | β Vendor-specific |
| Agentless | β SSH-based collection | β Requires agents |
| Historical data | β Parquet files for time-travel | β Limited retention |
| Topology discovery | β LLDP/CDP-based | β Manual mapping |
| Open source | β Apache 2.0 | β Commercial |
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Overgrowth Pipeline β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ β
β β NetBox ββ β SuzieQ ββ β Drift Detection β β
β β (SoT) β β Collector β β & Remediation β β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
Intended State Actual State Drift Analysis
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 7: Observability - Collect actual network state β
β Stage 7b: Drift Detection - Compare actual vs intended β
β Stage 8: Validation - Auto-remediate approved changes β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
What SuzieQ Detects
1. Configuration Mismatches
- Device hostname changes
- Management IP changes
- Unexpected device roles (leaf acting as spine)
2. VLAN Drift
- Missing VLANs: Intended in SoT but not on device
- Extra VLANs: Present on device but not in SoT
- VLAN name mismatches
3. IP Address Conflicts
- Duplicate IPs across devices
- IP mismatches vs NetBox IPAM
- Gateway conflicts
4. Interface State Drift
- Interfaces expected UP but actually DOWN
- Interfaces expected DOWN but actually UP
- Description mismatches
5. Routing Issues
- BGP neighbor states
- OSPF adjacency problems
- Route count anomalies
Usage Examples
Basic Drift Detection
from agent.pipeline_engine import OvergrowthPipeline
from agent.network_model import NetworkModel
# Create pipeline with SuzieQ enabled
pipeline = OvergrowthPipeline()
# Your network model (from NetBox or YAML)
model = NetworkModel(...)
# Stage 7: Collect actual state
obs_results = pipeline.stage7_observability(model)
print(f"Collected state from {obs_results['collection']['devices_polled']} devices")
# Stage 7b: Detect drift
drift_results = pipeline.stage7b_drift_detection(model)
if drift_results['drift_detected']:
print(f"β οΈ Drift detected! Score: {drift_results['drift_score']:.2f}")
print(f"Issues found:")
print(f" - Config mismatches: {drift_results['summary']['config_mismatches']}")
print(f" - Missing VLANs: {drift_results['summary']['missing_vlans']}")
print(f" - Interface issues: {drift_results['summary']['interfaces_down']}")
else:
print("β No drift - network matches SoT")
Auto-Remediation
# Stage 8: Validate and auto-remediate
validation = pipeline.stage8_validation(model)
compliance = validation['compliance_report']
print(f"Compliance Status: {compliance['status']}")
print(f"Drift Score: {compliance['drift_score']:.2f}")
if 'remediation' in validation:
print(f"Applied {validation['remediation']['applied']} automatic fixes")
print(f"Skipped {validation['remediation']['skipped']} (require manual approval)")
Direct SuzieQ Client Usage
from agent.suzieq_client import SuzieQClient
# Initialize client
suzieq = SuzieQClient(use_suzieq=True)
# Collect state from devices
devices = [
{'name': 'leaf-01', 'ip': '10.0.0.11', 'username': 'admin', 'password': 'admin'},
{'name': 'spine-01', 'ip': '10.0.0.1', 'username': 'admin', 'password': 'admin'}
]
collection = suzieq.collect_network_state(devices)
print(f"Collected from {collection['devices_polled']} devices")
# Get topology
topology = suzieq.get_topology()
print(f"Discovered {len(topology['nodes'])} nodes")
print(f"Found {len(topology['edges'])} LLDP/CDP connections")
# Get VLAN summary
vlans = suzieq.get_vlan_summary()
for device, vlan_list in vlans.items():
print(f"{device}: {vlan_list}")
# Detect drift
intended_state = {
'devices': [...],
'vlans': [...],
'subnets': [...]
}
drift = suzieq.detect_drift(intended_state)
if drift.has_drift:
print(f"Drift Score: {drift.drift_score:.2f}")
print(f"Missing VLANs: {len(drift.missing_vlans)}")
print(f"Extra VLANs: {len(drift.extra_vlans)}")
# Generate remediation plan
plan = suzieq.generate_remediation_plan(drift)
for action in plan:
status = "AUTO-FIX" if action['auto_fix'] else "MANUAL"
print(f"[{status}] {action['action']} on {action['device']}")
print(f" Commands: {action['commands']}")
# Apply auto-approved fixes
results = suzieq.apply_remediation(plan, auto_approve=True)
print(f"Applied: {results['applied']}, Skipped: {results['skipped']}")
Remediation Safety
Auto-Fix vs Manual Approval
SuzieQ classifies remediation actions by safety:
| Action | Auto-Fix | Reason |
|---|---|---|
| Add missing VLAN | β Yes | Safe - doesn't disrupt traffic |
| Remove extra VLAN | β No | Dangerous - could break connectivity |
| Enable interface | β No | Dangerous - interface may be down intentionally |
| Fix IP mismatch | β Yes | Safe - corrects IPAM drift |
| Update descriptions | β Yes | Safe - cosmetic change |
Approval Workflow
# Get remediation plan
plan = suzieq.generate_remediation_plan(drift)
# Filter by auto-fix status
auto_fixes = [a for a in plan if a['auto_fix']]
manual_review = [a for a in plan if not a['auto_fix']]
print(f"Auto-fix ready: {len(auto_fixes)}")
print(f"Require approval: {len(manual_review)}")
# Apply only auto-approved
suzieq.apply_remediation(plan, auto_approve=True)
# For manual items, integrate with ticketing system
for action in manual_review:
# Create Jira ticket, ServiceNow change request, etc.
create_change_request(
title=f"Fix {action['action']} on {action['device']}",
commands=action['commands'],
reason=action['reason']
)
Installation
Option 1: Mock Mode (Default)
No installation required! Overgrowth includes mock SuzieQ for testing:
suzieq = SuzieQClient(use_suzieq=True)
# Automatically uses mock mode if suzieq not installed
Mock mode simulates:
- State collection from devices
- Topology discovery
- Drift detection with heuristic rules
- Remediation plan generation
Option 2: Real SuzieQ
Install SuzieQ for production use:
# Install SuzieQ
pip install suzieq
# Verify installation
suzieq-cli --help
# Create SuzieQ directory
mkdir -p ~/.suzieq/parquet
Configure SuzieQ inventory (~/.suzieq/inventory.yml):
sources:
- name: overgrowth
hosts:
- url: ssh://[email protected]
devtype: eos
- url: ssh://[email protected]
devtype: eos
Start SuzieQ poller:
suzieq-poller -I ~/.suzieq/inventory.yml -d ~/.suzieq/parquet
Configuration
SuzieQ Client Options
from pathlib import Path
# Custom data directory
suzieq = SuzieQClient(
suzieq_dir=Path("/opt/suzieq/data"),
use_suzieq=True
)
# Collect with custom namespace
suzieq.collect_network_state(
devices=[...],
namespace="production" # vs "staging", "lab", etc.
)
# Query specific namespace
topology = suzieq.get_topology(namespace="production")
Drift Tolerance
Adjust drift score threshold in stage8_validation():
# Default: 20% drift allowed
results['validation_passed'] = drift_score < 0.2
# Stricter: 10% drift
results['validation_passed'] = drift_score < 0.1
# Looser: 30% drift
results['validation_passed'] = drift_score < 0.3
Drift score calculation:
drift_score = total_drift_items / (devices_checked * expected_resources)
Examples:
- 0.0 = Perfect match
- 0.15 = Minor drift (2-3 VLANs missing)
- 0.5 = Moderate drift (half of config missing)
- 1.0 = Complete drift (nothing matches)
Integration with Pipeline
Stage 7: Observability
Collects actual network state via SuzieQ:
- Device inventory
- Interface states
- VLAN configurations
- IP addressing
- Routing protocol status
- Topology via LLDP/CDP
obs_result = pipeline.stage7_observability(model)
# Returns: collection stats, topology, VLAN summary
Stage 7b: Drift Detection
Compares actual vs intended (NetBox SoT):
- Config mismatches
- Missing/extra VLANs
- IP conflicts
- Interface state drift
- Routing issues
drift_result = pipeline.stage7b_drift_detection(model)
# Returns: drift score, detailed findings, remediation plan
Stage 8: Validation & Remediation
Validates network compliance and auto-remediates:
- Generates compliance report
- Applies auto-approved fixes
- Queues manual approval items
- Re-checks drift after remediation
val_result = pipeline.stage8_validation(model)
# Returns: validation status, compliance report, remediation results
Drift Detection Examples
Example 1: Missing VLAN
Intended (NetBox):
vlans:
- id: 10
name: Users
- id: 20
name: Servers
- id: 99
name: Management
Actual (Device):
show vlan brief
VLAN Name Status Ports
---- -------------------------------- --------- ------
1 default active
10 Users active Et1-10
99 Management active Et48
Drift Detected:
{
"missing_vlans": [{
"device": "leaf-01",
"vlan_id": 20,
"vlan_name": "Servers",
"severity": "ERROR"
}]
}
Remediation:
! Auto-fix: Add missing VLAN
vlan 20
name Servers
exit
Example 2: Extra VLAN (Shadow IT)
Intended: VLANs 10, 20, 99
Actual: VLANs 10, 20, 99, 666 (unauthorized)
Drift Detected:
{
"extra_vlans": [{
"device": "leaf-01",
"vlan_id": 666,
"severity": "WARNING"
}]
}
Remediation:
! Manual approval required - could disrupt traffic
no vlan 666
Example 3: Interface Down
Intended: All uplinks should be UP
Actual: Ethernet48 is DOWN
Drift Detected:
{
"interface_down": [{
"device": "leaf-01",
"interface": "Ethernet48",
"expected_state": "up",
"actual_state": "down",
"severity": "WARNING"
}]
}
Remediation:
! Manual approval - verify interface should be up
interface Ethernet48
no shutdown
exit
Troubleshooting
Mock Mode vs Real Mode
Check if SuzieQ is installed:
suzieq = SuzieQClient(use_suzieq=True)
print(f"Mock mode: {suzieq.mock_mode}")
# Expected output:
# WARNING: suzieq not installed - using mock mode
# Mock mode: True
SuzieQ Not Collecting Data
- Check SSH connectivity:
ssh [email protected]
- Verify inventory:
cat ~/.suzieq/inventory.yml
- Check poller logs:
tail -f ~/.suzieq/suzieq-poller.log
- Test with CLI:
suzieq-cli
device show
Drift Detection Returns Empty
Cause: SuzieQ hasn't collected data yet
Solution: Run initial collection
# Start poller for 1 minute
suzieq-poller -I ~/.suzieq/inventory.yml -d ~/.suzieq/parquet --run-once
Auto-Fix Not Working
Cause: auto_approve=False (default)
Solution:
# Enable auto-approval
results = suzieq.apply_remediation(plan, auto_approve=True)
# Or apply manually via Netmiko
for action in plan:
if action['auto_fix']:
device = ConnectHandler(
device_type='cisco_ios',
host=action['device'],
username='admin',
password='admin'
)
device.send_config_set(action['commands'])
Performance
Collection Frequency
SuzieQ poller intervals:
- Lab: Every 1 minute (rapid testing)
- Staging: Every 5 minutes (drift detection)
- Production: Every 15 minutes (capacity planning)
Data Retention
SuzieQ stores data in Parquet files:
# Check storage usage
du -sh ~/.suzieq/parquet
# Cleanup old data (>30 days)
find ~/.suzieq/parquet -mtime +30 -delete
Drift Detection Performance
| Network Size | Devices | Drift Check Time |
|---|---|---|
| Small | 1-10 | < 1 second |
| Medium | 10-100 | 1-5 seconds |
| Large | 100-500 | 5-15 seconds |
| Enterprise | 500+ | 15-60 seconds |
Best Practices
1. Use Namespaces
Separate environments:
# Production namespace
suzieq.collect_network_state(devices, namespace="production")
# Staging namespace
suzieq.collect_network_state(devices, namespace="staging")
2. Schedule Regular Drift Checks
# Cron job: Check drift every hour
#!/bin/bash
cd /opt/overgrowth
source venv/bin/activate
python -c "
from agent.pipeline_engine import OvergrowthPipeline
pipeline = OvergrowthPipeline()
model = NetworkModel.from_yaml('network.yaml')
drift = pipeline.stage7b_drift_detection(model)
if drift['drift_detected']:
print(f'ALERT: Drift score {drift[\"drift_score\"]:.2f}')
# Send alert to Slack/PagerDuty
"
3. Auto-Fix Low-Risk Changes
# Safe changes: Add VLANs, update descriptions
auto_fix_actions = ['add_vlan', 'update_description', 'fix_ip_mismatch']
# Apply only safe actions
safe_plan = [a for a in plan if a['action'] in auto_fix_actions]
suzieq.apply_remediation(safe_plan, auto_approve=True)
# Manual review for everything else
manual_plan = [a for a in plan if a['action'] not in auto_fix_actions]
notify_team(manual_plan)
4. Track Drift Over Time
from datetime import datetime
# Log drift history
drift_log = {
'timestamp': datetime.now().isoformat(),
'drift_score': drift.drift_score,
'devices_checked': drift.devices_checked,
'issues': {
'config_mismatches': len(drift.config_mismatches),
'missing_vlans': len(drift.missing_vlans),
'extra_vlans': len(drift.extra_vlans)
}
}
# Store in database or CSV
append_to_history(drift_log)
# Alert if drift increasing
if drift_score > previous_score * 1.5:
alert("Drift increasing rapidly!")
Future Enhancements
Planned Features
- StackStorm Integration: Event-driven auto-remediation when drift detected
- RAG-based Learning: Learn from past drift incidents to prevent recurrence
- Change Correlation: Link drift events to recent changes (Git, tickets)
- Predictive Drift: ML model to predict drift before it happens
- Multi-Region Sync: Ensure consistency across global deployments
Community Contributions
See CONTRIBUTING.md for how to add:
- New drift detection rules
- Additional remediation actions
- Custom compliance policies
- Integration with other observability tools
References
- SuzieQ Documentation: https://suzieq.readthedocs.io/
- SuzieQ GitHub: https://github.com/netenglabs/suzieq
- Overgrowth Repo: https://huggingface.co/spaces/MCP-1st-Birthday/overgrowth
- NetBox Integration: See
NETBOX_INTEGRATION.md - Batfish Digital Twin: See
BATFISH_INTEGRATION.md
Support
Questions? Issues? Contributions?
- Open an issue on HuggingFace Spaces
- Join our Discord: [link]
- Email: [email protected]