The Eve cluster
When reading the forums I have seen many speculations on the cluster setup. This has ranged from "THE" Eve server meaning one monstrous machine to a geographically dispersed spiderweb of proxies around a central cluster. Both extremes are far of although the latter one was considered at one point but later abandoned due to the inconvenience and minimal gains involved.
In reality Eve is running on a cluster in Thamesside London. At the heart of it there is a SQL server cluster, two machines running in an Active-Standby configuration. If one breaks down the other automatically senses it and takes over.
In front of the sql server machines there are 42 application servers of which we are using 28 at the moment. These are what we call SOL servers (1) These servers calculate trajectories, run station services, market and almost everything else. In front of the SOL servers there are 13 proxies (since 13 isn´t such a lucky number we only run 12 of these atm :).
The proxies purpose is to keep track of all the services each client is using. Think of it as one fat pipe from your client to the proxy which then breaks up into many smaller pipes to several SOL servers depending on where you are and what you are doing in the game. Each SOL server then contacts the database for data it needs in order to process your clients requests and to calculate the outcomes of interactions between you and other people in the game. Each proxy has both an individual user limit and also a cluster limit, both configurable which explains messages such as "proxy full" and "cluster full" which some of you may be familiar with from beta.
In front of the proxy servers we have two load balancers in a failover configuration which take care of rerouting connections from a virtual server ip to one of the real proxies behind it. It masks the real server ip's and keeps track of which client is connected to which server.
As you can imagine keeping all of these servers and the different services that run on them in sync is quite complex, especially so when users change locations or services and the proxies need to close connections for the users in one place and open up connections on other SOL servers. This happens for example every time you exit or dock at a station or jump at a jumpgate. It is at this critical transition phase that users have been prone to getting stuck.
One SOL server may also be more busy than another because more users are using services from it and therefore you may experience lag in certain locations and using some services. Someday (hopefully soon) load balancing across SOL servers will be fluent and load will shift between servers as the service load fluctuates.
Today however load balancing is based on preset factors. When new services are started (a user entering a solar system which no one else is using for example) the total load for each sol server is calculated from the number of services and their corresponding load factor and the sol server with the lowest factor accquires the new service. Unfortunately it also keeps it until nobody is using it any more. This has the unwanted side effect of ultimately collapsing that server if too many users are using the services it has assigned. This very seldom happens although we have occational hotspots of activity which may cause lag for some users while others are fine since they are not using services from that particular server.
Well, that was Eve cluster 101. Enjoy!
**1) **Short for solar system, this nickname dates back to early beta when a solar system was a single entity that had to run all solarsystem services and components on a single server where as now it has been broken into multiple units which are dispersed over many servers for load balancing purposes.