Back to Incidents
resolved major ID: INC-2025-004

SDN Controller Outage Due to Database Connection Limit

Reported: August 1, 2025 at 08:00 UTC Last Updated: December 9, 2025 at 09:18 UTC

Incident Details

Summary SDN controller outage caused by MariaDB max_connections limit preventing network operations execution
Duration 1h 0m
Affected Services SDN Controller, DHCP Services, Network Automation
Impact SDN, DHCP, Database, MariaDB, Network Operations

Summary

On August 1, 2025, AS214503 experienced a Software-Defined Networking (SDN) controller outage caused by a MariaDB max_connections setting that was inadvertently reset during a database upgrade. The incident began at 08:00 UTC and was fully resolved at 09:00 UTC, lasting 1 hour.

The connection limit reverted from the previously configured 16000 to the default 1000, preventing the SDN controller from executing critical network operations. This resulted in extended DHCP lease times and degraded network automation capabilities. No packet forwarding issues were observed as the data plane remained operational throughout the incident.

Impact

  • Primary Cause: MariaDB max_connections limit exceeded due to configuration reset
  • Affected Systems: SDN controller, network automation, DHCP services
  • Configuration Issue: max_connections reset from 16000 to default 1000 during database upgrade
  • DHCP Impact: Extended lease renewal times, some renewals delayed
  • Network Operations: Automated configuration changes blocked
  • Data Plane: No packet forwarding disruption, traffic continued normally
  • Customer Impact: Minimal impact, existing connections maintained, new DHCP leases delayed

Timeline (All times UTC)

08:00 - SDN controller alerts triggered for failed database connections
08:05 - Network operations team investigates controller unresponsiveness
08:10 - DHCP lease renewal delays reported by monitoring systems
08:15 - Database connection pool exhaustion identified in MariaDB logs
08:20 - MariaDB max_connections setting found reset to default value of 1000 (previously 16000)
08:25 - Database upgrade logs reviewed, configuration reset identified as cause
08:30 - max_connections restored to 16000, SDN controller service restarted
08:40 - DHCP lease processing returns to normal operation
08:50 - Configuration management updated to prevent future resets during upgrades
09:00 - Full SDN controller functionality restored, incident resolved

Root Cause

Primary Cause: MariaDB database upgrade that reset the max_connections setting from the previously configured 16000 back to the default value of 1000. The SDN controller maintains persistent database connections for network state management, configuration changes, and DHCP lease tracking, requiring approximately 200-300 concurrent database connections during normal operation.

Contributing Factors:

  • Database upgrade procedure did not preserve custom configuration settings
  • Lack of post-upgrade configuration validation checks
  • Missing configuration management for database settings
  • No automated monitoring to detect configuration drift after upgrades

Technical Details: During the MariaDB upgrade, the database configuration file was replaced with default settings, overwriting the production-tuned max_connections value. This caused connection pool exhaustion when the SDN controller attempted to maintain its normal operational connection count, preventing new network operations from being executed.

Resolution

Immediate Actions:

  • Identified MariaDB upgrade as the cause of configuration reset at 08:25 UTC
  • Restored MariaDB max_connections from 1000 to 16000 at 08:30 UTC
  • Restarted SDN controller service to clear connection pool
  • Monitored DHCP lease processing recovery
  • Verified packet forwarding remained unaffected

Permanent Fixes:

  • Implemented configuration management for MariaDB settings in version control
  • Added post-upgrade validation procedures to verify custom configurations
  • Created automated configuration drift detection with alerting
  • Updated database upgrade procedures to preserve custom settings

Configuration Changes:

1
2
-- Restored production configuration in /etc/mysql/mariadb.conf.d/50-server.cnf
max_connections = 16000

Prevention Measures

  • Configuration Management: MariaDB settings now tracked in version control with automated validation
  • Database Monitoring: Implemented comprehensive MariaDB connection tracking with alerting at 70% utilization
  • Post-Upgrade Validation: Added automated configuration verification to upgrade procedures
  • Connection Pool Optimization: Reviewed and optimized SDN controller connection pooling parameters
  • Capacity Planning: Established database connection capacity planning process
  • Documentation Update: Updated deployment procedures to include database tuning requirements
  • Testing Enhancement: Added load testing for database connection limits in staging environment

Lessons Learned

  1. Default database configurations are rarely suitable for production workloads
  2. Database upgrade procedures must preserve custom configuration settings
  3. Post-upgrade validation is critical to detect configuration drift
  4. Automated monitoring for configuration changes prevents extended outages
  5. SDN controller database requirements must be properly sized during initial deployment
  6. Data plane and control plane separation provided resilience during the outage

Technical Notes

Data Plane Resilience: The separation between control plane (SDN controller) and data plane (packet forwarding) ensured that existing network traffic continued without interruption. Only new network configurations and DHCP lease renewals were affected during the incident.

DHCP Lease Behavior: Existing DHCP leases continued to function normally throughout the incident. Only lease renewals and new assignments experienced delays due to the SDN controller’s inability to update lease databases.


Incident resolved. SDN controller operating normally with optimized database configuration and enhanced monitoring.

#sdn #mariadb #database #dhcp #network-operations #configuration