Errors downloading weights on model startup
Resolved
Feb 10 at 11:16pm UTC
We've not seen any failures after 22:50 UTC, so we're calling this incident resolved.
Our investigation revealed that internal DNS lookup failures put a storage cache subsystem into a broken state. Next week we'll be looking into how to make our systems more robust in situations like this one.
Thank you for your patience.
Affected services
Prediction serving
Updated
Feb 10 at 10:44pm UTC
As far as we can tell things are looking a lot better. We're continuing to monitor the situation for the time being.
Affected services
Prediction serving
Updated
Feb 10 at 10:08pm UTC
We have identified the cause of this issue and are rolling out a fix.
Affected services
Prediction serving
Created
Feb 10 at 09:52pm UTC
We are seeing elevated incidences of weights failing to download on model startup.
Affected services
Prediction serving