Thu 23 Nov 2023 11:11:10 PM UTC

During the holiday we took the instance down to perform full backups
and conduct 4.2.x migration testing on a working copy of the database.
Previously a working copy of the existing database was successfully
migrated to 4.2.0 on the development server and then upgraded to the
subsequent 4.2.1 release successfully.  However, there were issues 
that prevented a successful upgrade on the production server which have
been noted and will be addressed before attempting an upgrade.  The instance
is now on 4.1.10 with the preserved database prior to any modifications
Thank you for your patience.

Mon 09 Oct 2023 07:34:28 AM UTC

The servers were taken offline briefly to perform maintenance
and full backups.

Wed 27 Sep 2023 05:20:27 PM UTC

We experienced a prolong power outage at the SEA2 DC in
Tukwila Washington which caused the site to be inaccessible.
The PNW is currently experiencing waves of wind and rain due
to a cyclone off the coast.  Normally in a power serivce 
failure we would run on batteries and a diesel generator but
it looks like the transfer failed.  We are waiting to hear
from our provider regarding the outage.  At this point service
has been restored.  Thank you for your patience.

Sat 05 Aug 2023 04:35:07 AM UTC

The instance was upgraded from v4.1.5 to v4.1.6

Sun 25 Dec 2022 06:00:00 AM UTC

The instance is currently in scheduled maintenance which took 57 minutes.
Happy Christmas!

  • Full offline backups were completed
  • New Replication server was configured for live database redundancy
  • Third database server for testing new features was implemented
  • Thu 15 Dec 2022 06:40:32 AM UTC

    The instance was in scheduled maintenance for 32 minutes to correct
    inconsistencies in database indices for media_attachments,
    featured_tags, statuses and status_stats.
    

    Wed 14 Dec 2022 06:01:37 AM UTC

    The instance was in scheduled maintenance for 53 minutes to:
    
  • perform a full offline backup
  • run bin/tootctl fix-duplicates
  • reindex several indexes
  • check pg_toast tables Feeds were then rebuild once the instance was back line.
  • Sun 11 Dec 2022 07:25:42 PM UTC

    On Monday December 5th we experienced a RAID failure specifically affecting
    our postgresql server.  The database was moved to another machine where it 
    could be rolled back 12 days and repairs could be made.  This affected new
    accounts and new statuses created after November 20th. The instance was 
    brought back online within 3 hours to resume service.  New accounts that no
    longer existed due to the rollback were identified and have since been
    contacted via email. 
    
    Our database was created in April of 2017 and contains status for over 5 years.
    It has been in continuous service and has migrated through several versions of
    mastodon/postgresql.  This is the first time we've had to roll back the database.
    
    * Are my statuses from that time period lost? 
    
    Yes and no, public statuses are federated and many folks have found their posts
    cached both in their apps and on remote instances.  If you are able to, you can
    repost what is missing from the SDF localtime line.
    
    

    Further details on the postgresql server

    On November 11th we leased a machine from our datacenter temporarily as we were still in negotiation for our second expansion cabinet and we needed another machine to scale the Mastodon instance. After preliminary testing and staging, we were able to move the database into production on November 12th. When the RAID failure occurred the datacenter staff identified the components used in the build to be at fault and has taken full responsibility for the hardware. A new machine with new disks was rebuilt to replace this machine at no charge to us. Recovery and rollback of the database was, of course, our responsbility. November has been a hectic time for many instances and in fact, almost none of us were prepared to scale as fast as we've had to in order to accommodate the twitter exodous. For SDF, many things have gone very well and while it is unfortunate that this brief time period is affected, it is a minor setback and we ask everyone to be positive and to move forward from this. With the implementation of our daily maintenance window we can minimize the impact of future issues.

    postgresql pg_toast corruption for other instance maintainers

    In our case pg_toast corruption affected the complicated 'accounts' table. Here are some notes that may help you if you see log messages like: ERROR: missing chunk number 0 for toast value N in pg_toast_N When this corruption occurs, it is not possible to run a cull or even successfully dump the table. Identifying the row (and in our case, the column) can be tricky for larger/older instances. See https://gist.github.com/supix/80f9a6111dc954cf38ee99b9dedf187a for notes on tracking things down. The column in our case was fortunately 'note' in accounts, so while the link above suggests psql> delete from mytable where id = 'N'; all we really needed to do was: psql> update accounts set note = '' where id = 'N'; The above URL has both a shell script and a perl script to run a select against every row to identify which row is affected by the pg_toast corruption. Thankfully for mastodon if you enable postgresql logging during a tootctl accounts cull the id will be logged. You can confirm you've got the right id by doing a: psql> select * from accounts where id = 'N'; and you should see: missing chunk number 0 for toast value N in pg_toast_N From there, do a simple select against each individual column in the table where id = 'N' and update where you can.