Issues booting models on A40 hardware
Resolved
Oct 09 at 04:34pm UTC
At this time all but a handful instances have recovered and prediction serving should be normal for the A40 hardware type.
We expect the remaining (low single digit) number of instances to be running within the next few minutes.
Affected services
Prediction serving
Updated
Oct 09 at 04:18pm UTC
We have made a configuration change to circumvent the identified networking issue. We are seeing improvements with A40 boots and working through the backlog of predictions and instance boots.
Affected services
Prediction serving
Updated
Oct 09 at 03:56pm UTC
We're still working with our networking provider to identify the root cause. We will continue to update as we learn more.
Affected services
Prediction serving
Updated
Oct 09 at 03:10pm UTC
We've tracked this down to a networking issue and we've escalated to our networking provider. We will update as we progress.
Affected services
Prediction serving
Created
Oct 09 at 02:45pm UTC
We are seeing issues with pulling model images from our A40 cluster. We are investigating.
Affected services
Prediction serving