Previous incidents
Llama3-70b-chat Delays
Resolved Jul 25 at 11:44pm UTC
This has been resolved and predictions should be handled normally.
2 previous updates
Predictions on trained versions not starting
Resolved Jul 17 at 04:36pm UTC
We've fixed the issue and predictions on trained versions are running again.
1 previous update
Intermittent issues affecting some hardware types
Resolved Jul 16 at 08:16pm UTC
Things are running normally as of about 15 minutes ago.
2 previous updates
API degradation
Resolved Jul 09 at 12:15pm UTC
Service has been restored. Thanks for your patience!
2 previous updates
Llama 3 70b instruct model not processing predictions
Resolved Jul 03 at 11:11am UTC
The model is processing predictions properly again, and the queue is empty.
1 previous update
Some models unavailable
Resolved Jun 21 at 03:40pm UTC
Service has been restored as of a few minutes ago.
1 previous update
Errors publishing model versions
Resolved Jun 20 at 10:41pm UTC
Model version publishing is now working as expected.
1 previous update
Errors with inference
Resolved Jun 04 at 12:36am UTC
The issues with inference was limited to select LLM models. At this time the problematic code has been rolled back and all inference should be operating normally at this time.
1 previous update
Problems booting models in one region
Resolved May 30 at 07:03pm UTC
All outstanding issues have been resolved. Model boots and setups should be functioning normally again.
2 previous updates
Workloads in one of our clusters are backed up
Resolved May 29 at 04:06pm UTC
All queues have been dealt with, and predictions and trainings are running smoothly once again.
3 previous updates
Degraded autoscaling performance
Resolved May 22 at 02:05pm UTC
Backlogs have been cleared and all models are now running smoothly.
3 previous updates
5XX and slow responses
Resolved May 17 at 12:09am UTC
The source of the problem appears to have been that our API was unable to connect to one of its underlying data stores, most likely due to a networking interruption. This has recovered as of 00:02 UTC and traffic is being served normally once again. We will continue to monitor.
1 previous update
Webhooks not sending for Dreambooth trainings
Resolved May 09 at 07:11pm UTC
Webhooks for Dreambooth trainings are working again.
1 previous update