Previous incidents
Predictions degraded for L40S, H100, and CPU hardware types
Resolved Feb 07 at 06:55pm UTC
We are now caught up and running below capacity. Thanks for your patience!
3 previous updates
API instability for L40S, H100 and CPU workloads.
Resolved Feb 06 at 03:20pm UTC
This issue appears to have been a result of another bandwidth spike partly as a result of our incident earlier today. The issue has now been resolved. We are going to be working to prevent incidents of this kind from recurring.
1 previous update
Setup failures on L40S and H100 hardware
Resolved Feb 06 at 10:45am UTC
This incident is now resolved.
4 previous updates
Prediction creation unavailable for L40S and H100 hardware
Resolved Feb 03 at 06:30am UTC
The cache used by the API for predictions was misconfigured for a period of ~20 minutes beginning at 20:34 UTC until a rollback completed at 20:56 UTC. Models using the L40S and H100 hardware types were affected. During the period of misconfiguration, prediction creation was severely limited, resulting in many API responses with status 503.
Instability and delays for H100 and L40S
Resolved Jan 30 at 09:56am UTC
The networking issue with our provider was resolved at 0940 UTC, and all requests have been running normally since then.
2 previous updates
Billing and metric delays
Resolved Jan 15 at 04:29pm UTC
The background jobs are running again, and we've caught up to present as of about 1609 UTC (20 minutes ago).
1 previous update
Dashboard inaccessible due to redirect
Resolved Jan 09 at 04:37pm UTC
The redirect has been reverted and the dashboard should be accessible again.
1 previous update
L40s temporary stock out
Resolved Dec 14 at 12:41am UTC
At 22.15 UTC Jan 13 an issue forced us to shift some GPU workloads, which caused stock outs leading to increased wait times to spin up new model instances using L40s.
The work has completed and GPUs are now available as normal as of 00.15 UTC Jan 15th.
Data deletion delayed
Resolved Dec 14 at 01:28pm UTC
We've caught up with prediction deletion, and our system is once again deleting predictions on time.
1 previous update
T4 predictions unavailable
Resolved Dec 11 at 08:27pm UTC
T4 predictions were unavailable approximately between the hours of 1800 and 2027 UTC. We found an issue with the nvidia driver installation on our T4 hardware targets.
This only affected predictions running against the T4 hardware.
We have deployed a fix and are backfilling the outstanding predictions.