Previous incidents

August 2023
Aug 19, 2023
1 incident

Issues scheduling to certain hardware

Degraded

Resolved Aug 19 at 11:41pm UTC

Thank you for your patience. At this time all hung workloads targeted for the T4 hardware should no longer be stuck in starting phase.

2 previous updates

Aug 18, 2023
1 incident

Replicate Web Down

Downtime

Resolved Aug 18 at 03:22pm UTC

Engineers have rolled back a change to the website and at this time the website should now be responding as expected.

4 previous updates

Aug 15, 2023
1 incident

Webside and API Outage

Downtime

Resolved Aug 15 at 07:28pm UTC

Reverting the identified change and purging known bad cache values has resolved the error rate within the API service. API and Web should be responding as expected at this time.

2 previous updates

Aug 11, 2023
1 incident

Delays starting some models

Degraded

Resolved Aug 11 at 03:13pm UTC

We believe that as of a few minutes ago the last customer impact from this issue has been resolved and all queues have cleared. To help you correlate this incident with any issues you may have seen: as far as we can tell the earliest customer impact from this incident started at about 11:00 UTC today.

3 previous updates

Aug 09, 2023
1 incident

Models not booting

Degraded

Resolved Aug 09 at 10:14pm UTC

The fix has been rolled out for all models.

2 previous updates

Aug 03, 2023
1 incident

520 error responses from API

Degraded

Resolved Aug 03 at 01:48pm UTC

We've identified the source of the errors -- a global load balancing service appears to have been misbehaving -- and made changes to how we serve api.replicate.com to mitigate the problem. As of a few minutes ago, we are no longer serving 520 error responses to customers.

1 previous update

Aug 02, 2023
1 incident

Replicate website unavailable

Downtime

Resolved Aug 02 at 04:48pm UTC

We're back! We pushed a bad change and have rolled it back. Sorry for the inconvenience.

1 previous update

July 2023
Jul 31, 2023
1 incident

Prediction requests failing

Downtime

Resolved Jul 31 at 12:15pm UTC

All prediction requests are now responding normally. We're still investigating the underlying cause.

1 previous update

Jul 28, 2023
2 incidents

API errors/timeouts

Degraded

Resolved Jul 28 at 06:10pm UTC

The API is fully recovered. Unfortunately we are still at least partially in the dark about what triggered these problems. We're continuing to investigate.

2 previous updates

API errors/timeouts

Degraded

Resolved Jul 28 at 05:32am UTC

Services have recovered. We'll be following up with our provider to understand how the scope of the planned maintenance expanded to affect customer workloads.

4 previous updates

Jul 26, 2023
1 incident

API errors

Degraded

Resolved Jul 26 at 10:51am UTC

We've identified a service that was starved of compute resources and addressed that problem. Service has been restored.

1 previous update

Jul 19, 2023
1 incident

Prediction creation errors

Degraded

Resolved Jul 19 at 06:35pm UTC

We've restored service to the queueing system and predictions are flowing again.

2 previous updates

June 2023
Jun 27, 2023
1 incident

Delayed prediction start times

Degraded

Resolved Jun 27 at 04:35pm UTC

Predictions are flowing as expected once again. We'll continue to monitor the situation.

1 previous update

Jun 23, 2023
1 incident

Web and API failures

Degraded

Resolved Jun 23 at 03:45pm UTC

The rollback fixed things and we're back to normal.

1 previous update

Jun 21, 2023
1 incident

Garbled/corrupted responses

Degraded

Resolved Jun 21 at 04:10pm UTC

We've identified what's causing this issue and have rolled back the change. Affected predictions will have completed successfully and you can re-request their status through the API.

1 previous update

Jun 05, 2023
1 incident

Model autoscaling degraded

Degraded

Resolved Jun 05 at 11:23pm UTC

Autoscaling issues have been resolved for all models and everything should be operating normally.

1 previous update