Introducing Quasar
Tuesday, 14 September 2021, is a significant flash point which represents the end of an era and the access to powerful opportunities for EVE Online. Lost in the bright lights of the new NPE and tucked under the pixels of Skill Plans reverberates a fundamental change in how EVE Online moves into the future.
The last time the networking layer was fundamentally changed was in 2011 with the introduction of CCP’s IOCP implementation, “CarbonIO”, which eventually became the foundation of the infamous time dilation. Originating as some scribbles under the heading “Project Sanguine”, reasoning began about the problem space in which CarbonIO lived. Every optimization in EVE comes down to careful negotiation with Python’s Global Interpreter Lock (GIL). Simply, Python can only do one thing at a time. EVE’s adoption of Stackless Python, implementation of IOCP through StacklessIO then CarbonIO, and cooperative design around time dilation is all to maintain the favorite illusion: New Eden breathes. What if the GIL didn’t have to be courted for every idea that arose? How can the hardware industry’s explosion in core counts over individual processor clock speed be taken advantage of?
There have been many experiments in this regard which are tangential to Project Sanguine, with the most public one being EVE: Aether Wars. The goal there wasn’t to fundamentally change the communication model of EVE Online, but instead change the simulation model. In contrast, Project Sanguine targeted the boring bits which represent EVE’s dense feature set. Simulating nearly 9,000 players in the same space could be faster if New Eden didn’t have to worry about everything else on its to-do list. So Project Sanguine landed on two goals: dodge the GIL and clear the table for moar lasers.
The first form of Project Sanguine emerged with ESI and the first iteration of EVE Portal in late 2016. Through these projects, a new paradigm was established within the server architecture of EVE Online: a message bus. From this new escape hatch, the bottlenecks associated with the GIL were rediscovered, but with a clearer picture of their expensive manifestations: message routing, serialization, and transmission. If one ship fires one laser in the middle of 1000 ships, that’s 1000 messages which need to be sent immediately all over the globe. The simulation must address that message to 1000 destinations as a copy (message routing), convert that data to a wire format (serialization), and then send the data over the wire (transmission). In most cases, CarbonIO has been addressing each of those issues, but within the custody of the GIL. CarbonIO has served EVE Online well for quite some time, but much has changed on the turbulent seas of the internets since 2011.
After seeing the patterns evolve in this new ecosystem, it became clear that a more standardized protocol was needed if this paradigm were ever to be exploited. With the integration of gRPC it became possible to combine the message routing capabilities of the message bus with the lightning fast serialization of protocol buffers (gRPC’s message standard). It is still necessary to schedule data with the GIL for transmission, but this is now buffered at a higher level on a separate thread. This means all transmission, serialization, and message routing happens outside the GIL except the memory copy that has to happen in-between. It cannot get much faster than that.
moar lasers
A firehose was now attached to New Eden, but where does it all go? When the building of ESI began, so did the adoption of more cloud native technologies such as Kubernetes, and as the need for simple concurrency primitives to digest this information started to be seen, a greater move into Go was made. With these technologies accumulating into an ecosystem of their own, work started on building out features to take advantage of the new ability to work with New Eden with modern standards. You’ve seen many of them.
The first was the Activity Tracker. It attaches itself to the firehose and monitors New Eden’s respiration to keep track of all your exploits. There’s also a variation of that with Opportunities which attempts to predict the trajectory of a Capsuleer and highlight more interesting parts of New Eden. The message bus has also been used to power the Abyssal Proving Grounds leaderboards. A massive amount of work has gone into providing the development teams with an ecosystem to harness the power of a messaging architecture with these features. However, each of these features represents a gap in capabilities: the desktop client.
Until the release of Skill Plans, each feature has “smuggled” data into Tranquility through CarbonIO. This is no longer the case as skill plan operations are not only communicated through gRPC but never touch Tranquility, or its database.
Why is bypassing Tranquility and its database so important? To really understand that, one must talk about the failures. Part of the journey has led to many new techniques and tools in which to view New Eden. One concept is distributed tracing using a new favorite toy: Honeycomb.io. (More about that journey here.) Armed with all the new shiny toys, it was clear exactly what was happening with Skill Plans as it was released into the wild:
It could also be seen in general that the performance was ok with a lot of room for improvement:
Then the following morning a chaos monkey appeared and started to harass the hamsters:
Yeah, that’s 500k milliseconds, AKA 8 minutes and 20 seconds, to send a message. The details will require some Fanfest beer and a marker board. Here are the important parts of this fail state: Tranquility didn’t go down, thousands of players weren’t disconnected, and most players weren’t affected (you can see a majority of the messages are still packed in the very bottom of the graph). This is because we don’t communicate at all with Tranquility in the traditional sense. No CarbonIO to the traditional proxies which then go to the server node and then the database. Instead, Tranquility focuses on what’s more important, and the EVE desktop client is communicating via gRPC into the new ecosystem where the Skill Plan service lives with its own database.
As you might remember recently, a volcano was drained to test No Downtime for Tranquility. One of the most powerful characteristics of the new ecosystem: No Downtime. There wasn’t a need to restart Tranquility to rescue Skill Plans. There was no need to deploy a patch to the server or the desktop client. This is a peek into the journey of Project Sanguine becoming EVE’s new technology platform: Quasar. The feeling was that now was the right point in time to give it a name so it could be more easily comprehended and referenced, as well as give you more insight into what has been going on recently with EVE’s technological advancements and how they align with setting the game up for a thriving third decade. It’s also serendipitous that this quadrant name is Gateway as it heralds the direct usage of the gRPC gateways.
Also, space words man. They just sound good.
What’s next?
Work continues on clearing the table by refurbishing many old services which provides two vectors of momentum: build up more foundational capabilities for Quasar in terms of manipulating more than just Skill Plans in the universe, and normalize ancient systems to pave the way for faster iteration.
What does that mean for the average Capsuleer? More opportunities to expose more powerful features across more mediums.
The idea of publishing simulation data through Quasar has also been toyed with…but that might take a minute.
To join the player discussion, please head on over to the official thread on EVE Online forums.