Back to overview
Degraded

Setup failures on L40S and H100 hardware

Feb 06 at 08:58am UTC
Affected services
Prediction serving

Resolved
Feb 06 at 10:45am UTC

This incident is now resolved.

Updated
Feb 06 at 10:27am UTC

Most systems are now operating normally again. We are continuing to monitor the situation.

Updated
Feb 06 at 10:06am UTC

As some of you may have noticed, things got worse before they got better. When the upstream storage provider restored service, models pending setup resulted in a large bandwidth surge. We're currently managing the effects of that surge, which has affected the speed of predictions and prediction webhook delivery.

Updated
Feb 06 at 09:09am UTC

We've identified the underlying problem -- a storage outage at an upstream provider -- and are investigating paths to mitigate the impact of the upstream outage.

Created
Feb 06 at 08:58am UTC

We're investigating an issue that's preventing some models running on L40S hardware from successfully completing setup. We'll update when we have more information.