Tranquility Issues Resolved
This Monday it was brought to our attention that an alliance couldn't get Sovereignty 4 in their capital system even though they should. Looking into it we concluded that their capital station was a conquerable station which shouldn't be possible to set as a capital station. Fixing this required changing procedures in the database so attempting to set a conquerable station as capital station causes an error.
Unfortunately, due to unforeseen behavior in the server code a certain argument was passed to the procedure, which did not have anticipated default value, causing the procedure to error on every call from conquerable stations that were set as capital stations, of which there are 10 of on Tranquility.
While hotfixing this we failed to notice that the error flag code we copied was bad, that error condition can possibly have caused other crashes, in the sense that it did not close the transaction. This in turn led to that every call to this procedure would lock up a session to the database as it waited indefinitely for the transaction to be either rolled back or closed. The sessions locking up created a plethora of different locks all over the database, causing us to chase for a ghost of the problem. Eventually this led to a database fail-over which meant Tranquility crashed.
This oversight has been located in two other procedures as well, but none of them are called often enough on Tranquility to cause any real problems.
To sum it up, a series of unfortunate exceptional cases set up an environment on TQ wherein 10 stations on the entire server could repeatedly lock sessions in the DB until it had to do a failover to free them up. As a result of this, our hotfixing protocols are being revised to prevent this from happening again in the future.
Discussion of this news item may be found in this forum thread.