Back to overview
Degraded

Issues booting models on A40 hardware

Oct 09 at 02:45pm UTC
Affected services
Prediction serving

Resolved
Oct 09 at 04:34pm UTC

At this time all but a handful instances have recovered and prediction serving should be normal for the A40 hardware type.

We expect the remaining (low single digit) number of instances to be running within the next few minutes.

Updated
Oct 09 at 04:18pm UTC

We have made a configuration change to circumvent the identified networking issue. We are seeing improvements with A40 boots and working through the backlog of predictions and instance boots.

Updated
Oct 09 at 03:56pm UTC

We're still working with our networking provider to identify the root cause. We will continue to update as we learn more.

Updated
Oct 09 at 03:10pm UTC

We've tracked this down to a networking issue and we've escalated to our networking provider. We will update as we progress.

Created
Oct 09 at 02:45pm UTC

We are seeing issues with pulling model images from our A40 cluster. We are investigating.