Slow start on some predictions and trainings (A40 and some A100)
Resolved
Sep 21 at 09:06pm UTC
We have worked through the pending predictions and trainings and now see normal start times.
Affected services
API
Updated
Sep 21 at 07:33pm UTC
There has been improvements to the start times. We are still seeing delays in starts for predictions and trainings on A40 and some A100 GPUs; we are continuing to work through the pending predictions and trainings.
API and Website in general remain responsive and available outside of the affected GPU targets.
Affected services
API
Updated
Sep 21 at 06:29pm UTC
We continue to see slow starts for A40 and some A100 workloads. We are continuing to work through pending predictions and trainings for these hardware types.
Affected services
API
Updated
Sep 21 at 05:13pm UTC
We have identified the root cause and are working to clear the backlog of pending predictions and trainings.
Affected services
API
Created
Sep 21 at 04:14pm UTC
We are aware of an issue with some GPU targets (A40) taking longer than expected to start predictions and trainings. We are investigating the issue.
API and Web are otherwise functioning as normal.
Affected services
API