Back to overview
Downtime

Workloads in one of our clusters are backed up

May 29 at 03:17pm UTC
Affected services
Prediction serving

Status Report Update State Resolved
May 29 at 04:06pm UTC

All queues have been dealt with, and predictions and trainings are running smoothly once again.

Status Report Update State Updated
May 29 at 03:25pm UTC

Predictions and trainings are running again. There are still some substantial queues, so it will take a while for the autoscaler to get everything processed.

We'll monitor it until it's fully recovered.

Status Report Update State Updated
May 29 at 03:22pm UTC

The majority of predictions and trainings are failing to start in one of our clusters. All A40 workloads and most A100 workloads are affected.

The upstream provider is investigating the issue.

Status Report Update State Created
May 29 at 03:17pm UTC

We're investigating an issue with predictions and trainings in one of our clusters, due to an incident with one of our providers. Workloads running on A40s and A100s are affected.