Previous incidents
Media models on L40S and H100 hardware types have degraded service
Resolved Apr 09 at 06:49pm UTC
At this time L40S, H100, and streaming APIs have returned to normal operation.
Thank you for your patience.
1 previous update
Elevated Error Rates on API
Resolved Mar 22 at 04:31am UTC
We noticed elevated error rates (500 class responses) on our API. Investigation of the errors resulted in discovering one of the APIs in the primary loadbalancer was having issues making requests to one of our serving regions.
Our engineers have temporarily removed this api endpoint from production traffic while we investigate.
The elevated error rate has returned to normal.
Delays for L40S hardware
Resolved Mar 13 at 05:12pm UTC
We are back under capacity for L40S hardware. Thanks for waiting!
1 previous update
Delays for models on L40S hardware type
Resolved Mar 11 at 11:31pm UTC
We are back below capacity limits for the L40S hardware type. Thanks for your patience!
1 previous update
Delays for predictions on L40S hardware
Resolved Mar 10 at 10:29pm UTC
Most models running on L40S hardware should not be experiencing delays. We are still seeing a handful of models unable to setup due to download rate limiting from a few external providers, but we're going to continue working on that as a separate problem. Thanks for waiting!
2 previous updates
Disruption of Prediction Serving
Resolved Mar 08 at 08:26am UTC
All backlog is being worked through and at this point all services have been restored to full functionality.
Again thank you for your patience.
5 previous updates
Prediction Serving Disruption
Resolved Feb 24 at 03:31pm UTC
Replicate was altered to a brief issue with prediction creation, update, and completion. There was a window for about 5 minutes starting at 2025-02-24 15:22:30 UTC.
A database update caused a brief disruption causing delays in persisting data. At this time the Replicate platform has resumed normal operations.
Webhook delivery impacted on CPU, L40S and H100 hardware
Resolved Feb 24 at 11:43am UTC
Things have been stable for 15 minutes now. We believe this to be resolved.
3 previous updates
Webhook delivery degraded for A100 hardware
Resolved Feb 21 at 05:23pm UTC
Webhooks are now being delivered in a timely fashion. Thanks for your patience!
1 previous update
High capacity utilization
Resolved Feb 19 at 08:07am UTC
We are back at full capacity. Thanks for your patience!
3 previous updates
Some models failing to setup on A100 hardware
Resolved Feb 19 at 01:49am UTC
The rollback of the suspected misconfiguration is complete and all queues have recovered. Thanks for your patience!
1 previous update
Predictions degraded for L40S, H100, and CPU hardware types
Resolved Feb 07 at 06:55pm UTC
We are now caught up and running below capacity. Thanks for your patience!
3 previous updates
API instability for L40S, H100 and CPU workloads.
Resolved Feb 06 at 03:20pm UTC
This issue appears to have been a result of another bandwidth spike partly as a result of our incident earlier today. The issue has now been resolved. We are going to be working to prevent incidents of this kind from recurring.
1 previous update
Setup failures on L40S and H100 hardware
Resolved Feb 06 at 10:45am UTC
This incident is now resolved.
4 previous updates
Prediction creation unavailable for L40S and H100 hardware
Resolved Feb 03 at 06:30am UTC
The cache used by the API for predictions was misconfigured for a period of ~20 minutes beginning at 20:34 UTC until a rollback completed at 20:56 UTC. Models using the L40S and H100 hardware types were affected. During the period of misconfiguration, prediction creation was severely limited, resulting in many API responses with status 503.