API errors and request delays
Resolved
Nov 26 at 07:16pm UTC
We're seeing healthy behavior since our upstream provider applied further fixes in the last hour. We will be sharing further details of how this happened once they are available.
Affected services
API
Updated
Nov 26 at 05:16pm UTC
The partner we're working with on this issue has shared with us that they are struggling to manage extremely high bandwidth to some of their systems and this is causing the impact which is affecting Replicate and our customers.
If you're affected and can change your models or deployments to run on other hardware (such as our newly-added L40S GPUs) that will mitigate the impact you're seeing, as this only impacts A100 GPUs.
Affected services
API
Updated
Nov 26 at 04:30pm UTC
We've noticed that the fix previously applied appears to have regressed. We've escalated this issue and will provide an update as soon as we have one.
Affected services
API
Updated
Nov 26 at 03:57pm UTC
As of a few minutes ago we believe the underlying issues here have been resolved. We don't fully understand the nature of the problem yet but will be following up with our partners to make sure we (and they) do.
Affected services
API
Updated
Nov 26 at 02:32pm UTC
We're continuing to investigate this issue, and are aware of the inconvenience this may be causing. We ask for your patience as we work with our infrastructure providers to identify the source of the disruption.
Affected services
API
Created
Nov 26 at 12:28pm UTC
We're aware of an issue affecting A100 hardware types which is causing delays and error responses from our API. We are investigating the issue and will provide an update when we have more information.
Affected services
API