Dev Log Week 2026-13: Returning to UI design

martin · March 30, 2026, 10:46am

Finally, after five weeks (8, 9, 10, 11, 12) of framework upgrades and bug hunts, I found the time to work on actual feature development. Or so I thought…

My backend framework upgrade from week 8 didn’t let me off the hook quite that easily and kept a pesky out-of-memory error to strike sporadically (and in the middle of the night), rendering our account management backend inaccessible and thereby all of our game services at semi-broken. This left me no choice but to troubleshoot and fix the issue. And if the latter didn’t work, find countermeasures to prevent another downtime.

There’s a brief update about what I did in the linked incident report, but here are the steps in a bit more detail:

Adjusted the configuration of the service to write so-called heap dumps whenever an out-of-memory error occurrs.
Waited for one of these to be written.
Analysed the heap dump, finding that the root cause is likely somewhere in 3rd-party code.
Spent several hours trying to figure out whether and what code on my end might be triggering the stack overflow in the 3rd party code.
Rewrote one suspicious implementation of a deprecated interface (which might have fixed the issue…or not…no further crash so far) but gave up on finding the root cause after that.
Added a health check to the service’s container to mark it as “unhealthy” once it stops serving requests.
Added an “auto-healer” container that restarts the service if it goes unhealthy.
Got our ops-partner to add the service to their monitoring so there’s a higher chance of a human with admin access to notice that the service is down.

As said, the service was well-behaving since I’ve done the changes. But the problem only occurs sporadically, so it’ll take a few days until I am somewhat convinced the problem is gone and/or my counter-measures work sufficiently well.

Oh, and that UI design work…didn’t spend as much time on it as I was planning to. But I am very hopeful that I’ll actually make some progress on that front in week 14…