Previous incidents

January 2025
Jan 15, 2025
1 incident

Billing and metric delays

Degraded

Resolved Jan 15 at 04:29pm UTC

The background jobs are running again, and we've caught up to present as of about 1609 UTC (20 minutes ago).

1 previous update

Jan 09, 2025
1 incident

Dashboard inaccessible due to redirect

Degraded

Resolved Jan 09 at 04:37pm UTC

The redirect has been reverted and the dashboard should be accessible again.

1 previous update

December 2024
Dec 14, 2024
1 incident

L40s temporary stock out

Resolved Dec 14 at 12:41am UTC

At 22.15 UTC Jan 13 an issue forced us to shift some GPU workloads, which caused stock outs leading to increased wait times to spin up new model instances using L40s.

The work has completed and GPUs are now available as normal as of 00.15 UTC Jan 15th.

Dec 12, 2024
1 incident

Data deletion delayed

Degraded

Resolved Dec 14 at 01:28pm UTC

We've caught up with prediction deletion, and our system is once again deleting predictions on time.

1 previous update

Dec 11, 2024
1 incident

T4 predictions unavailable

Resolved Dec 11 at 08:27pm UTC

T4 predictions were unavailable approximately between the hours of 1800 and 2027 UTC. We found an issue with the nvidia driver installation on our T4 hardware targets.

This only affected predictions running against the T4 hardware.

We have deployed a fix and are backfilling the outstanding predictions.

November 2024
Nov 26, 2024
1 incident

API errors and request delays

Degraded

Resolved Nov 26 at 07:16pm UTC

We're seeing healthy behavior since our upstream provider applied further fixes in the last hour. We will be sharing further details of how this happened once they are available.

5 previous updates

Nov 19, 2024
1 incident

Flux Dev Inference Delays

Degraded

Resolved Nov 19 at 01:11am UTC

The back log of predictions has been worked through and we are seeing normal prediction times return. Thank you for your patience.

2 previous updates

Nov 17, 2024
1 incident

Flux Dev Prediction Delays

Degraded

Resolved Nov 17 at 02:35am UTC

A of 0223 UTC Nov 17th the backlog has been processed and Flux Dev is handling requests as expected.

1 previous update

Nov 14, 2024
1 incident

Predictions failing for H100 hardware

Degraded

Resolved Nov 14 at 03:19am UTC

We have identified a hardware failure and have isolated the affected node(s). We are seeing a return to normal service for H100-targeted predictions and trainings.

1 previous update

Nov 11, 2024
1 incident

Flux Pro, Recraft, and Ideogram failed predictions

Resolved Nov 11 at 05:10pm UTC

We identified an internal component that caused errors with Flux Pro, Recraft, and Ideogram models. The errors occurred between approximately 1540 UTC and 1709 UTC on November 11, 2024.

As of the time this status update is published, the internal component has been rolled back and we are seeing normal prediction handling for impacted models.

Nov 08, 2024
1 incident

Prediction delays for black-forest-labs/flux-1.1-pro and meta/meta-llama-3-70...

Resolved Nov 08 at 07:05pm UTC

Between the hours of 16:30 - 19:00 UTC, predictions sent to flux-1.1-pro and meta-llama-3-70b-instruct were delayed by up to 1 hour. This was the result of a rollout of an internal component that broke a small number of models which we then rolled back. Due to the high volume of predictions handled by these two models, the backlog grew fairly quickly, but we have now caught up with ...

Nov 06, 2024
2 incidents

H100 Hardware Queueing

Degraded

Resolved Nov 06 at 08:34pm UTC

After evaluation the queues impacted (predictions submitted prior to migration to the alternate region) are being truncated.

You will not be billed for predictions that are dropped in this manner, however, the predictions may appear as "in process" or "queued" for a period of time until the platform automation identifies them as dropped. It is safe to cancel and/or resubmit predictions impacted in this manner.

This truncation impacts a few thousand total predictions across all models target...

5 previous updates

File streaming not working

Degraded

Resolved Nov 06 at 11:27am UTC

Our fix has rolled out and file output streaming is now working again.

1 previous update