Back to overview
Downtime

H100 model serving down

Oct 29 at 03:28pm UTC
Affected services
API
Prediction serving

Resolved
Oct 29 at 05:46pm UTC

We have moved traffic for the models impacted (H100 target hardware) back to the H100 class GPUs. Predictions and trainings targeting H100 class GPUs have returned to normal.

Updated
Oct 29 at 04:10pm UTC

Predictions on flux-dev are now also running in a different cluster.

Updated
Oct 29 at 03:45pm UTC

Predictions on flux-schell and flux fine tunes are successfully running in another cluster. Predictions on flux-dev are still not working.

Updated
Oct 29 at 03:37pm UTC

We're moved flux models and fine tunes to run in a different cluster until we can get this cluster back online.

Created
Oct 29 at 03:28pm UTC

One of our clusters is currently down. We know the immediate cause, and are working on fixing it. This is the cluster that runs our H100s, so all H100 models are currently down, including flux and flux fine tunes.