If you are playing in Yeager, you might have noticed sporadic downtimes over the past few weeks. Often they are very short. So short, in fact, that our monitoring doesn’t even raise an alert. In other cases, though, they last several minutes. This has been going on for a while now, and while a couple of other game worlds show similar issues occasionally, it’s Yeager that worries me most. Unfortunately, the service-level logging we have in place at the moment doesn’t show anything useful and our ops-partner wasn’t able to find an obvious cause in infrastructure-level metrics either. In the past, similar issues were often caused by spikes of bot traffic, usually from AI crawlers or similar bad actors. But we’ve pretty much locked all of that down and it doesn’t seem to be the reason this time.
Long story short: In week 47 I spent a lot more time than I’d like to troubleshoot this issue. More precisely, to implement more ways to “look into” the live services to spot anything that might be off. We do have pretty good monitoring already, but having a better look at the internals of the game servers themselves might reveal something. Emphasis on “might” here…at the moment, this is a stab in the dark. At the end of the day, this might be a hidden hardware defect or something entirely different. We’ll see ![]()
More or less out of frustration I squeezed in another small maintenance patch on Wednesday.
The coming week will be primarily about backend work and the relaunch of Domination will happen in week 48. In that regard, I’ve had an idea that I’d love to get your opinion on. More on that on this thread.