Back to overview
Downtime

Errors within one region

Mar 06 at 01:45am UTC
Affected services
API
Prediction serving

Resolved
Mar 06 at 04:06am UTC

Workloads across all regions are now running normally. We apologise for the disruption, and will working to better improve our ability to shift load between providers in situations like this one.

Updated
Mar 06 at 03:05am UTC

Things remain in a degraded state but work is starting to flow again. We will continue monitoring and update when the service is fully recovered.

Updated
Mar 06 at 02:33am UTC

We're continuing to work with our provider, as one of our regions is currently unable to handle traffic. Workloads running on A40 and A100 (80GB) hardware are particularly affected.

Updated
Mar 06 at 01:58am UTC

The incident is involving network services within one of our providers. As the situation evolves we'll provide further updates.

We apologize for the inconvenience and thank you for your patience during this time.

Created
Mar 06 at 01:45am UTC

One of our regions is seeing elevated error rates for inference and training. We are working with out provider to determine root cause and remediate the issue.

This impacts A40, A100-80G, and a subset of A100-40G hardware types and can impact some language models (token based pricing).