Back to overview
Degraded

Slow start on some predictions and trainings (A40 and some A100)

Sep 21 at 04:14pm UTC
Affected services
API

Resolved
Sep 21 at 09:06pm UTC

We have worked through the pending predictions and trainings and now see normal start times.

Updated
Sep 21 at 07:33pm UTC

There has been improvements to the start times. We are still seeing delays in starts for predictions and trainings on A40 and some A100 GPUs; we are continuing to work through the pending predictions and trainings.

API and Website in general remain responsive and available outside of the affected GPU targets.

Updated
Sep 21 at 06:29pm UTC

We continue to see slow starts for A40 and some A100 workloads. We are continuing to work through pending predictions and trainings for these hardware types.

Updated
Sep 21 at 05:13pm UTC

We have identified the root cause and are working to clear the backlog of pending predictions and trainings.

Created
Sep 21 at 04:14pm UTC

We are aware of an issue with some GPU targets (A40) taking longer than expected to start predictions and trainings. We are investigating the issue.

API and Web are otherwise functioning as normal.