Post Incident Report: Retail Express API failures and Back Office performance degradation (2–3 Feb 2026)
Summary
Between 2–3 February 2026, some Retail Express customers experienced intermittent API failures and slow or unstable performance in the Back Office/Admin Panel. The impact was primarily isolated to Segment 3.
The underlying cause was an Azure platform incident that prevented our production environment from scaling up as demand increased. With scaling blocked, resources in Segment 3 became saturated, resulting in slower response times and intermittent request failures.
Customer impact
Impacted customers may have experienced:
API failures or timeouts, including SOAP and WMS endpoints used by integrations and warehousing connections
Examples reported: OrderCreate (SOAP), GetOutboundITO (WMS)
Additional endpoints reported by customers: GetOrders, GetProducts, GetITOs
Integration disruption, including some customers being unable to receive new orders or sync data with warehousing systems
Back Office/Admin Panel performance issues, including slow page loads, timeouts, or intermittent errors
Timeline (high level)
Reports were received indicating impact may have started as early as 10:00pm NZST on 2 Feb, with additional reports from around 6:30am AEDT on 3 Feb.
Microsoft reported the related Azure incident began around 19:46 UTC on 2 Feb 2026, with mitigation rolled out progressively by region.
Root cause
Microsoft experienced an Azure platform issue affecting VM service management and scaling operations. In practical terms:
Our autoscale rules were triggering, but Azure was not able to provision additional capacity.
Manual attempts to increase instance count also failed.
With no additional capacity available, existing Segment 3 resources became overloaded, leading to:
increased latency
intermittent errors
API timeouts and failures
Back Office/Admin Panel slowness
Microsoft advised the cause was tied to a configuration change that disrupted access to Microsoft-managed storage accounts required during VM provisioning and extension delivery. This prevented successful VM creation and scale operations until permissions were restored.
Resolution
Microsoft applied platform mitigations region by region to restore scaling and VM provisioning.
Once stability improved, we validated API behaviour across the affected areas, including checks of key endpoints.
Some customers continued to observe intermittent errors and slowness while the environment stabilised, and we continued monitoring until performance returned to normal.
What we are doing next
While the root cause was an upstream Azure platform incident, we are taking steps to reduce impact and improve clarity if a similar event occurs.
Current status
Services are expected to be operating normally. If you are still experiencing issues, please contact Support with:
the endpoint(s) impacted
approximate time of most recent failure
any error messages or request IDs
copy of the call