@ryan8 Can you specify when this happened? Recently?

For now, I am manually saving a checkpoint every 15m or so and it seems to be helping mitigate the issue. I’ll report back if I learn otherwise.