June 2
UPDATE: an incident report from OVH, the server hosting company, shows that they caused a widespread DNS outage that lasted 30-ish minutes.

June 1 18:00
At approximately 4:30 PM grid time, the entirety of the grid appeared to go down. None of the servers could be accessed and the DG WEB site itself was not accessible. Further investigation unncovered that the hosting company, soyoustart.com, was also not accessible. A few other grids were experiencing similar problems.

About 20 minutes later access was restored, and the server logs showed the grid itself did not crash; it was just cutoff from external access. With access restored, a number of regions were exhibiting connection issues. This often happens when they lose connection with their neighboring regions (it's an OpenSim bug).

Consequently, I performed a rolling restart on all the regions to clean up any stuck regions. A rolling restart is performed first on regions that have no avatars in then, then the system monitors occupied regions until all avatars leave it, and only then restarts those regions.

There is a gap where DC$ money transactions were not accepted from 4:30PM to 5:39PM. Anyone sending money to another avatar was prohibited from sending it. Unfortunately, there are badly written tip jars that assume money has been sent when in fact it was not. The logs, however, show no one lost any money during this possible gap.

The reason for this rather large network outage outside of DG (and some other grids) is not yet known.



Thursday, June 1, 2023

« Back