Previous incidents
Problems running some A40 models
Resolved Nov 10 at 10:09pm UTC
We have confirmed and corrected any model versions erroneously disabled during this issue.
Use of A40s for predictions and trainings is now working as expected.
4 previous updates
Replicate website unavailable
Resolved Nov 08 at 03:53pm UTC
It looks to us like one of our providers had a brief outage and things are now coming back. We're continuing to monitor the situation.
(Technical details: it looks like an upstream provider had a brief DNSSEC zone signing outage.)
1 previous update
Slow model startup in some cases
Resolved Nov 06 at 12:53am UTC
The slow model startup has resolved. We will continue to work internally and with our provider to remediate the root cause.
1 previous update
Slower predictions and webhook delivery
Resolved Nov 05 at 08:03am UTC
The prediction and webhook delivery issues are resolved now. There might be still a delay in webhook delivery of older predictions.
2 previous updates
Investigating predictions creation issues
Resolved Nov 02 at 06:53pm UTC
The issue has been resolved and predictions are now functioning normally.
2 previous updates
Replicate Web Internal Service Error
Resolved Nov 01 at 09:43pm UTC
Rollback of the problematic change has completed and Replicate website is now functioning normally again.
2 previous updates
SDXL Finetune errors
Resolved Nov 01 at 06:37pm UTC
We have rolled out a fix and confirmed finetunes are working as expected.
2 previous updates
Predictions and trainings degraded
Resolved Oct 19 at 02:30pm UTC
Predictions and trainings are back to normal.
1 previous update
Webhook delivery interrupted
Resolved Oct 08 at 08:13pm UTC
We identified a problem affecting a small portion of customers -- slow responses to webhooks caused a backlog in processing outbound webhooks -- and have deployed a change to increase available webhook processing capacity. Webhook delivery is back to normal as of a few minutes ago.
1 previous update
Pushing of new versions is broken
Resolved Oct 06 at 11:20am UTC
We've fixed the issue and you should be able to push new versions again.
1 previous update
Slow responses from replicate.com
Resolved Oct 05 at 07:29am UTC
Database load is back to normal, performance should be back to usual levels.
Web and predictions degraded
Resolved Sep 29 at 04:12pm UTC
We have now fully resolved the issues and API and replicate.com website are fully operational.
3 previous updates
Predictions and training degraded for one cloud provider
Resolved Sep 28 at 02:52pm UTC
The API and website are now working as expected for predictions and trainings.
1 previous update
Degraded API / API Errors
Resolved Sep 27 at 07:31pm UTC
We have identified a problematic ingress pod and have caused it to reschedule. The API and website are now working as expected for predictions and trainings.
1 previous update
Website downtime / API degraded
Resolved Sep 27 at 05:56pm UTC
We have rolled back the problematic change. Website functionality has been restored and API error rate has returned to normal.
1 previous update
Slow start on some predictions and trainings (A40 and some A100)
Resolved Sep 21 at 09:06pm UTC
We have worked through the pending predictions and trainings and now see normal start times.
4 previous updates
System unavailable
Resolved Sep 21 at 08:37pm UTC
We have recovered our caching service and see predictions and training succeeding.
1 previous update
Temporary capacity issues with 8xA40 hardware type
Resolved Sep 21 at 12:39am UTC
We resolved the capacity issues.
2 previous updates
Primary database outage
Resolved Sep 19 at 09:17pm UTC
Both the API and web are now back to normal. Predictions, trainings are functioning as expected.
We are continuing to monitor things.
2 previous updates
API degraded
Resolved Sep 18 at 04:50pm UTC
API is now behaving normally.
1 previous update
Web and predictions degraded
Resolved Sep 08 at 12:37pm UTC
Everything is resolved and back to normal.
During the downtime predictions were completing normally in API, but are not persisted.
1 previous update
Degraded Prediction and Training Start Times
Resolved Sep 07 at 08:06pm UTC
The issue with the upstream provider has been resolved. Predictions and Trainings are expected to be starting within normal timeframes.
1 previous update
Degraded Prediction Handling
Resolved Sep 06 at 04:15pm UTC
Prediction processing and prediction are working as expected now.
1 previous update
Issues starting predictions
Resolved Sep 01 at 08:45pm UTC
Everything should be working normally at this time.
4 previous updates