Dev Log Week 2026-02: Clustering

Winter break is over and as mentioned last week I got started by taking off some of the rougher edges around the location-based demand (LBD) feature which - as you might remember - was implemented in little more than a week right before Christmas :sweat_smile:

As a first visible measure, I adjusted the ground network tab of airport info pages to show the connected locations rather than other airports (which would always result in an empty list in game worlds with LBD enabled). Similarly, I changed the airport maps to show the connected locations as a sort of “hot spots” so one can get a picture of an airport’s catchment area.

Speaking of catchment area: Once I had the above changes finished and clicked around the game to see them in action, I immediately spotted all sorts of issues with the location data. Some examples:

  • The airport on the German island of Heligoland had no connected locations, a problem shared by many near-shore islands.
  • Airports like Newark had hundreds of nearby locations that should have been clustered to a handful of places.
  • The populations within the location catchment areas looked off for many places.

So I dove into what turned out to be quite the rabbit hole and revisited the “location clustering algorithm” that I wrote for LBD. In short, this algorithm takes hundreds of thousands of real-world locations and “clusters” them such that only a few thousand remain. It does this in a way such that “important” locations remain and the populations of the “removed” locations get distributed among the nearby remaining ones. It does this until it can’t reduce the locations any further (because no eligible neighbors remain) or a minimum amount of locations per country has been reached.

Given the limited information this algorithm works on, the process is riddled with edge-cases. I think I caught several of them, although I had to admit defeat when it came to properly representing islands. When you look at Heligoland in Wright after the release of the upcoming maintenance patch you’ll find that its ground network still makes no sense. But I’ll revisit this by itself once I get around to the respective roadmap item…

3 Likes

Should you not start at the clustering and give each cluster another parameter(s)? This could be:

  • country (like at North and South Korea)
  • “travel free area” (like Schengen)
  • “islandgroup” / “isolationgroup” (like Helgoland, Canarys,…)

Next steps would be:

  • those in the same “area” get merged (people travel to another airport closer to them, ignoring european borders) by distance, EXCEPT they have a different “islandgroup”
  • leaving: merged “areas”, still distinct “islands”
  • next step (more complicated)
    • list of exclusive border travels (NK <> SK both ways)
    • list of “difficult” border travels, maybe asymetrical! (look for need of visa or others!?)
      (DE <> UK)
    • list of “easy” border travels (no visa / application needed)
  • merge having those in mind? (maybe you need to reinstall “ground network” for border travels to represent the amount of time needed for the border affairs? but also for island → mainland connections?)
  • maybe if you cluster locations, also have a look at “travel times” (“google maps”) when merging? If you have obstacles (mountains, rivers, lakes) with few connections, they should tend to get longer? Do not merge by location but by travel time? (this would solve problems like FR UK)

Another approach:

  • make distinct “areas” of the world (start at municipality-level), join them, BUT have all water areas also as areas that can not be joined!

(feel free to contact me, I work with geodata occasionally, maybe you can get a group of us for some “brainstorming” via teams?)

The algorithm currently works on country-level. As said, Schengen, islands etc. are topics for future work with their own roadmap item (linked in OP).

The root issue isn’t the logic to apply (that’s fairly straight forward), it’s the incomplete information one needs to work with. At the moment, I do not use any actual geodata (boundaries etc.) but merely “locations” that, unfortunately, come with very little hierarchical information. So I know where they belong in terms of administration, but not geographically (beyond their coordinates). As such, “islands” do not exist in the data (neither the external location data, nor the internal airport data). Hence, a separate topic for which I will almost certainly work with land boundary data of some sort.

As a hint:

OSM seems to have “administrative boundaries”. You could intersect those with areas of water (or other barriers) to get a complete map of the world divided into areas which are either some administrative boundary OR unpassable..

(maybe i have a look and work on it just for fun…)

I do have the administrative boundaries. At least sort of, because locations refer to their administrative “parents”. But those often do not align with physical boundaries. Case in point: Many of the German islands in the North Sea belong to counties on land…just from the administrative structure you wouldn’t know it’s an island.

My rough idea (without having done that thing before) is to get landmass boundaries from somewhere, then intersect those with out location/airport point coordinates and this way figure out which places belong onto which landmass. This would likely be enough for out purposes…but loads of edge-cases exist, for sure.

What I forgot to say: Please feel free to have a stab at this. I am not too experienced with this and have so far shied away from using “actual” geo data. The locations I use atm are based on https://www.geonames.org/ data, which is rather limited (at least in the freely available version).