Disruption of Prediction Serving
Resolved
Mar 08 at 08:26am UTC
All backlog is being worked through and at this point all services have been restored to full functionality.
Again thank you for your patience.
Affected services
API
Prediction serving
Updated
Mar 08 at 07:50am UTC
At this time most services and prediction service has returned to normal. We are seeing elevated and incorrect contention for the GPU types (L40S and H100) making scaling of new instances slower than expected.
Affected services
API
Prediction serving
Updated
Mar 08 at 07:21am UTC
L40S, H100, and CPU hardware types continue to see degraded prediction performance.
Subsequent failures are being seen as we work through our backlogs. We will provide updates as information becomes available.
Affected services
API
Prediction serving
Updated
Mar 08 at 06:54am UTC
As of this time, all services have been restored and the faulty hardware has been removed from production.
Thank you for your patience.
Affected services
API
Prediction serving
Updated
Mar 08 at 06:37am UTC
Critical Services have been successfully migrated. As our final services come back online we are monitoring and expecting continued degraded service.
Affected services
API
Prediction serving
Created
Mar 08 at 06:25am UTC
We are experiencing a disruption in prediction serving for the H200, L40S, and CPU hardware types due to a hardware failure.
We are in process of bringing critical services back online.
Affected services
API
Prediction serving