Previous incidents
Issues scheduling to certain hardware
Resolved Aug 19 at 11:41pm UTC
Thank you for your patience. At this time all hung workloads targeted for the T4 hardware should no longer be stuck in starting phase.
2 previous updates
Replicate Web Down
Resolved Aug 18 at 03:22pm UTC
Engineers have rolled back a change to the website and at this time the website should now be responding as expected.
4 previous updates
Webside and API Outage
Resolved Aug 15 at 07:28pm UTC
Reverting the identified change and purging known bad cache values has resolved the error rate within the API service. API and Web should be responding as expected at this time.
2 previous updates
Delays starting some models
Resolved Aug 11 at 03:13pm UTC
We believe that as of a few minutes ago the last customer impact from this issue has been resolved and all queues have cleared. To help you correlate this incident with any issues you may have seen: as far as we can tell the earliest customer impact from this incident started at about 11:00 UTC today.
3 previous updates
Models not booting
Resolved Aug 09 at 10:14pm UTC
The fix has been rolled out for all models.
2 previous updates
520 error responses from API
Resolved Aug 03 at 01:48pm UTC
We've identified the source of the errors -- a global load balancing service appears to have been misbehaving -- and made changes to how we serve api.replicate.com to mitigate the problem. As of a few minutes ago, we are no longer serving 520 error responses to customers.
1 previous update
Replicate website unavailable
Resolved Aug 02 at 04:48pm UTC
We're back! We pushed a bad change and have rolled it back. Sorry for the inconvenience.
1 previous update
Prediction requests failing
Resolved Jul 31 at 12:15pm UTC
All prediction requests are now responding normally. We're still investigating the underlying cause.
1 previous update
API errors/timeouts
Resolved Jul 28 at 06:10pm UTC
The API is fully recovered. Unfortunately we are still at least partially in the dark about what triggered these problems. We're continuing to investigate.
2 previous updates
API errors/timeouts
Resolved Jul 28 at 05:32am UTC
Services have recovered. We'll be following up with our provider to understand how the scope of the planned maintenance expanded to affect customer workloads.
4 previous updates
API errors
Resolved Jul 26 at 10:51am UTC
We've identified a service that was starved of compute resources and addressed that problem. Service has been restored.
1 previous update
Prediction creation errors
Resolved Jul 19 at 06:35pm UTC
We've restored service to the queueing system and predictions are flowing again.
2 previous updates
Delayed prediction start times
Resolved Jun 27 at 04:35pm UTC
Predictions are flowing as expected once again. We'll continue to monitor the situation.
1 previous update
Web and API failures
Resolved Jun 23 at 03:45pm UTC
The rollback fixed things and we're back to normal.
1 previous update
Garbled/corrupted responses
Resolved Jun 21 at 04:10pm UTC
We've identified what's causing this issue and have rolled back the change. Affected predictions will have completed successfully and you can re-request their status through the API.
1 previous update
Model autoscaling degraded
Resolved Jun 05 at 11:23pm UTC
Autoscaling issues have been resolved for all models and everything should be operating normally.
1 previous update