| # Risk Model (Draft) | |
| ## 1. Overview | |
| Lightning mode translates a preset change request into a deterministic risk score (0β100) and a risk level. The model | |
| focuses on intent metadata onlyβno MAESTRO telemetry in Phase 1βso that judges can see how the MCP server reasons about | |
| risk in milliseconds. Each preset captures pre-change health, magnitude, and post-change signals, and the scoring engine | |
| turns those inputs into the same risk JSON exposed by the FastAPI MCP endpoint. | |
| ## 2. Inputs | |
| For every `(change_type, preset_id)` we define the following fields: | |
| | Field | Description | | |
| |-------|-------------| | |
| | `change_type` | `vlan`, `interface`, or `bgp_neighbor`. Determines base impact weight. | | |
| | `preset_id` | Scenario identifier (e.g. `leaf_tor_vlan_stage`, `tor_uplink_shutdown`). | | |
| | `pre_core_healthy` | `True/False` flag indicating control-plane health before the change. | | |
| | `pre_interface_errors` | Whether interface errors already exist on affected devices. | | |
| | `pre_existing_alarms` | Whether any alarms are active in the change scope. | | |
| | `num_devices_touched` | How many devices the change modifies. Used for impact magnitude. | | |
| | `post_lost_adjacencies` | Count of fabric adjacencies that disappear after the change. | | |
| | `post_new_alarms` | Whether new alarms fire after the change. | | |
| | `post_interface_errors` | Whether interface errors appear after the change. | | |
| | `blast_radius_summary` | Human-readable description of the scope. | | |
| | `context_note` | Short narrative used to build the explanation string. | | |
| These values live in `server/app/mcp.py` inside the `PRESETS` mapping. | |
| ## 3. Scoring algorithm | |
| 1. **Baseline pre-change (0β30)** | |
| ```text | |
| baseline = 0 | |
| +15 if pre_core_healthy is False | |
| +10 if pre_interface_errors is True | |
| +10 if pre_existing_alarms is True | |
| clamp 0β30 | |
| ``` | |
| 2. **Change impact (10β55)** | |
| ```text | |
| impact_type_base = 10 (VLAN) | 25 (interface) | 35 (BGP neighbor) | |
| impact_magnitude = min(20, 2 * num_devices_touched) | |
| change_impact = impact_type_base + impact_magnitude | |
| ``` | |
| 3. **Post-change penalties (0β40)** | |
| ```text | |
| post_penalty = 0 | |
| +20 if post_lost_adjacencies > 0 | |
| +10 if post_new_alarms is True | |
| +10 if post_interface_errors is True | |
| clamp 0β40 | |
| ``` | |
| 4. **Final score + level** | |
| ```text | |
| risk_score_raw = baseline + change_impact + post_penalty | |
| risk_score = clamp(risk_score_raw, 0, 100) | |
| ``` | |
| Levels: | |
| * 0β30 β `low` | |
| * 31β70 β `medium` | |
| * 71β100 β `high` | |
| The FastAPI server uses the same logic in `simulate_network_change`. | |
| ## 4. Worked examples | |
| ### VLAN β `leaf_tor_vlan_stage` | |
| * Inputs: healthy core, no alarms, 2 devices touched, no post-change penalties. | |
| * Scores: baseline 0, impact 14, post 0 β risk 14 (`low`). | |
| * Interpretation: localized change with clean pre/post checks β safe to stage. | |
| ### Interface β `tor_uplink_shutdown` | |
| * Inputs: healthy pre-state, 1 device, but 1 adjacency lost + new alarms after shutdown. | |
| * Scores: baseline 0, impact 27, post 30 β risk 57 (`medium`). | |
| * Interpretation: redundancy keeps risk from going `high`, but alarms + lost adjacency matter. | |
| ### BGP β `leaf_bgp_fabric_neighbor_add` | |
| * Inputs: healthy pre-state, 1 device, no penalties. | |
| * Scores: baseline 0, impact 37, post 0 β risk 37 (`medium`). | |
| * Interpretation: even clean BGP adds carry control-plane sensitivity, so Lightning keeps risk mid-range. | |
| ## 5. Limitations / future work | |
| * Presets emulate checks; future phases will populate them from MAESTRO telemetry. | |
| * Only three change types are modeled. WAN/core workflows will add more bases and penalties. | |
| * Full mode is still a placeholder; Lightning simply annotates that `mode=full` is not implemented yet. | |
| * No randomness; this phase is deterministic by design so MCP judges can validate outputs offline. | |