Nicosia/Tempelhof: Current status

We seem to have to different issues here:

Nicosia: The game world has built up various amounts of backlog recently. Our analysis have shown that the problem probably stems from some statistic reports that put very heavy load on the database when requested. If several users request those in parallel, the database and with it the game world comes to a crawl. We will probably deactivate the statistics in question tomorrow (they are non-critical) and will think of a better solution. More information on the measures taken will be posted on the dev-log when they’ve been implemented.

Tempelhof: As mentioned before, the problem is very mysterious in this case. Stapleton and Tempelhof are running on the same server and as such, share their technical resources. Nevertheless Stapleton performs far better than THF and more or less within the usual boundaries (note that it is one of our most recent and as such most active game worlds). Given these facts we can be more or less sure that it is not a hardware issue we are facing. At the same time tough, THF does not show any signs of errors or unexpected behavior. It "just runs slower" than the other game worlds, which is puzzling. I have applied a tiny patch today in an effort to shorten idle times between single update jobs (actually intended to allow other threads/user-requests to be handled in a speedy fashion) so the backlog gets worked off a bit faster. Also, pugnacity has taken any measure he could think of to squeeze some more performance out of the underlying database. Other than that we have to watch whether the game world is able to get rid of some backlog tonight and if not, I will probably use the "100% load trick" to take the majority of the demand distribution workload off the server tomorrow (filling all flights to 100% load to decrease the amount of possible connections).

I hope this clears things up a bit. We will keep you updated on our progress.

Errr, I’m sure Nicosia needs this same fix! You know, just in case…

Hi Martin,

Stapleton may has more airlines than Tempelhof, but Tempelhof transports far more passengers. Tempelhof has more small airports that are served by several airlines, and also has more possible routes to fly from any airport to any other airport. I would not be surprised if calculating the demand takes more time on Tempelhof.

By the way, Dera Ismail Khan (DSK) is such a small airport that no airline has an office there. No offices, no flights, but it took the server 17 seconds to calculate the demand. Is that normal ?

Jan

Yes, that is most likely having an effect on overall performance. But still, in case of any serious hardware problem I’d expect Stapleton to be affected at least partly.

Probably the ground network. Judging by the stats, DSK has an overall of 91 possible connections. Other than that, the 17 seconds it took to calculate this airport are exactly one of the things that puzzles us.

Oh, and btw: Nicosia seems to be fine again, THF has picked up around 3 hours over the night, so things aren’t looking great, but relatively speaking, they look better :)

In my opinion it’s a global performance problem which can only be solved by faster hardware. Why not show the additional statistic only for 1 or 2 credits a day and spent the income to one server per world instead of sharing?

As I said, it is not a hardware problem.

I don’t what tools you have to measure, but are Stapleton and Tempelhof getting an equal share of resources? How much disk and CPU activity is each causing?

Judging from Martin’s response - there’s no direct CPU hanging issues - but it could potentially be an IOPS issue relating to the database? Or is the database processing just fine on the drives?

With database issues, it’s not just a matter of "stick some more RAM in it" or "give it more horsepower" - if the drives aren’t up to it, there’s no hope. But once again, looking at Martin’s response, it seems the drives aren’t even the issue - whether they’re using standard 7.2kRPM SATA drives or 15K RPM SAS or SSD drives - it makes not much difference if there’s an actual issue with the database structure - which it seems is the issue here. But I’m open to be corrected of course

Agree! I don’t believe in a CPU-problem, too. I’m a DB-Pro and it looks like a lack of performance with the db. There are a lot of possibilities to handle with it. Perhaps to many statistics on the fly instead of temporary tables?

Hi guys,

if you allow me to be blunt… as a customer, I don’t really care what causes the problem. Tempelhof has been dragging for almost a week, and I am not going to clap my hands because the delay is now reduced from 12 hours to 7 hours. I still can’t use the ORS, and I still get cancellations if I touch a flight schedule. So please don’t let it drag another three or four days until the server is back on time. Just fix it.

I rarely see more than 20 players on the “online” list of Tempelhof, so I don’t think too many players are the problem. It is calculating the millions of possible routes passengers can take to travel from A to B that stretches the limits of the servers. Even on “normal” days, the server goes into the red every time it calculates a big (not even major) airport.

So… if it helps to reduce the ground network to a radius of 100 km, then reduce it. If we want passengers from small airports, we shall serve those small airports. And if it helps to put a maximum on the number of interlining agreements we can make, put a maximum on that number. We shall think harder before we sign an interlining agreement.

Jan

This is an inherent problem with a game that runs in real time. It’ll need to catch up in real time too, which means calculations take place faster than they normally would (and they do, THF is now at 4 hours delay. It catches up nicely over-night). The online users list is not a full list, as users have the ability to not display their online status. Besides, even a single user, by performing the right actions, could slow down the update process.

If the game ran at accelerated time (for instance, a day in-game would be 6 hours real time), then it would be an option to simply ‘clean the slate’ and skip the update process a full 6 hours. You’d miss a day of revenue, but also a day of expenses, so the net result would be more or less the same. Running in real time, that option doesn’t exist. You can’t skip the update process to actual time, you’re going to mess up a lot of the schedules.

Besides, we hope a fix has already been applied in optimizing the database some more and reducing the idle time between tasks (see martin’s post for more details). Hopefully that’s sufficient to keep the server up to date whenever it gets caught up.

Just to drop the note here: Tempelhof has caught up (we’re 3 minutes away from current time) now. Let’s hope it stays that way. :)

From the experience I have had with tracking down DB performance problems, the hardware was never the problem, but rather other issues like record locking or inappropriate index searches.

Sorry for still complaining, but even though the server catched up the performance isn’t satisfying. Several airports take more than 30 seconds for their update. The high score may go to Durban which took something around 240sec. for updating today.

Playing Airlinesim isn’t much fun these days (…weeks). :frowning:

Edit: Tempelhof

This is a problem created by players themselves. The more passengers want to fly and the more connections available from an airport, the longer the calculation will take. Not much that can be done about that, and it’s something that all servers face, and have been facing for much longer than the problems that popped up recently… It’d be nice if it could go faster, but right now, this is what we have… It’s already set up in such a way that these calculations are spread out more or less evenly, so the (fairly short) delays that do occur are spread throughout the day.

Might be, but it’s never been such a problem (on Kaitak/Tempelhof/Stapleton) to just click on the "statistics" of an airport, for analysing the competitors as it is now (from my point of view). If I’m trying this now… well… IF it works, it takes up to 2 minutes. But in almost every case, I won’t see the statistics-page anyway. Right now I’m trying to get the market shares for Mumbai - but after more than 5 Minutes I give up.

If there isn’t any change in the next few weeks I’m off. It’s impossible to enjoy the game as it was before (even on crowded servers with as many players/airlines/connections as it is now the case on Tempelhof [446 players… many big airlines gone after the launch of Stapleton, so the explanation doesn’t convince me]).

Nice answer… A problem with performance is a "problem created my players"? Sorry, but that’s a problem for simulogics and AS and they have to solve it.

I guess what he was trying to say was that very active and old servers tend to be more prone to performance issues than other game worlds. It’s not the fault of the players, just their mere presence ;)

Anyway…what all of us here need to realize is that AirlineSim might be a “professional project” (in the sense of “money is being earned”) but it arose from a hobby project of hobby developers. At the core of the game there is some pretty hefty number crunching at work that’s difficult to master without a computer science background. It’s what makes AirlineSim unique though, so if it wasn’t for that, we could just remove all the stuff that causes problems and be happy. But obviously that would leave AS just a few levels above FarmVille…and nobody - neither us nor the players - would want that.

We are constantly thinking about ways to improve performance and we try to implement as many of the solutions we come up with and that are feasible. But as said, our possibilities are limited. Both by our skillset and the game’s nature itself.

I do understand that completely. And I like Airlinesim very much, that’s why I’m here and that’s why I’m still complaining instead of just leaving. But: I can’t believe that it’s just a player-related problem on Tempelhof. Kaitak is way older. And: After the launch of Stapleton many users (even with large Airlines) left Tempelhof. So there are more indicators against his thesis than in favor of it. 450 players can’t cause this problem if 1200 are allowed. What will happen if a server is "full" and older than a year?

Atm everything seems fine again, but I won’t bet that it stays that way. If you’re telling me, you’re keeping an eye on that problem, I’m almost satisfied.