Back to overview
Degraded

Predictions degraded for L40S, H100, and CPU hardware types

Feb 07 at 05:40pm UTC
Affected services
Prediction serving

Resolved
Feb 07 at 06:55pm UTC

We are now caught up and running below capacity. Thanks for your patience!

Updated
Feb 07 at 06:30pm UTC

We are currently running at capacity. Most queues have caught up, but the possibility of delays still exists, so we will keep this incident open in a "degraded" state.

Updated
Feb 07 at 06:07pm UTC

We have cleaned up all of the models that were crashing or locked up, and we are now scaled out to max capacity while working through queue backlogs.

Created
Feb 07 at 05:40pm UTC

The majority of the delays we are seeing right now are due to models not setting up, which is likely due to a combination of configuration changes that clearly are not working as intended. We reverted the configuration changes and now we are in the process of cleaning up models that are crash looping or locked up, and starting to see capacity recover.