Sun 17 Nov 2024 07:00:00 AM UTC

A 45 minute preventative maintenance on the instance was performed.
Thank you for your patience.

Sun 10 Nov 2024 12:09:41 PM UTC

Database maintenance has been completed and any backlog is
now being processed.  Thank you for your patience.

Thu 23 Nov 2023 11:11:10 PM UTC

During the holiday we took the instance down to perform full backups
and conduct 4.2.x migration testing on a working copy of the database.
Previously a working copy of the existing database was successfully
migrated to 4.2.0 on the development server and then upgraded to the
subsequent 4.2.1 release successfully.  However, there were issues 
that prevented a successful upgrade on the production server which have
been noted and will be addressed before attempting an upgrade.  The instance
is now on 4.1.10 with the preserved database prior to any modifications
Thank you for your patience.

Mon 09 Oct 2023 07:34:28 AM UTC

The servers were taken offline briefly to perform maintenance
and full backups.

Wed 27 Sep 2023 05:20:27 PM UTC

We experienced a prolong power outage at the SEA2 DC in
Tukwila Washington which caused the site to be inaccessible.
The PNW is currently experiencing waves of wind and rain due
to a cyclone off the coast.  Normally in a power serivce 
failure we would run on batteries and a diesel generator but
it looks like the transfer failed.  We are waiting to hear
from our provider regarding the outage.  At this point service
has been restored.  Thank you for your patience.

Sat 05 Aug 2023 04:35:07 AM UTC

The instance was upgraded from v4.1.5 to v4.1.6

Sun 25 Dec 2022 06:00:00 AM UTC

The instance is currently in scheduled maintenance which took 57 minutes.
Happy Christmas!

Full offline backups were completed
New Replication server was configured for live database redundancy
Third database server for testing new features was implemented

Thu 15 Dec 2022 06:40:32 AM UTC

The instance was in scheduled maintenance for 32 minutes to correct
inconsistencies in database indices for media_attachments,
featured_tags, statuses and status_stats.

Wed 14 Dec 2022 06:01:37 AM UTC

The instance was in scheduled maintenance for 53 minutes to:
perform a full offline backup
run bin/tootctl fix-duplicates
reindex several indexes
check pg_toast tables

Feeds were then rebuild once the instance was back line.

Sun 11 Dec 2022 07:25:42 PM UTC

On Monday December 5th we experienced a RAID failure specifically affecting
our postgresql server.  The database was moved to another machine where it 
could be rolled back 12 days and repairs could be made.  This affected new
accounts and new statuses created after November 20th. The instance was 
brought back online within 3 hours to resume service.  New accounts that no
longer existed due to the rollback were identified and have since been
contacted via email. 

Our database was created in April of 2017 and contains status for over 5 years.
It has been in continuous service and has migrated through several versions of
mastodon/postgresql.  This is the first time we've had to roll back the database.

* Are my statuses from that time period lost? 

Yes and no, public statuses are federated and many folks have found their posts
cached both in their apps and on remote instances.  If you are able to, you can
repost what is missing from the SDF localtime line.

Further details on the postgresql server
On November 11th we leased a machine from our datacenter temporarily as we were
still in negotiation for our second expansion cabinet and we needed another 
machine to scale the Mastodon instance.  After preliminary testing and staging,
we were able to move the database into production on November 12th.  When the
RAID failure occurred the datacenter staff identified the components used in 
the build to be at fault and has taken full responsibility for the hardware.  A
new machine with new disks was rebuilt to replace this machine at no charge to
us.  Recovery and rollback of the database was, of course, our responsbility.

November has been a hectic time for many instances and in fact, almost none of
us were prepared to scale as fast as we've had to in order to accommodate the
twitter exodous.  For SDF, many things have gone very well and while it is 
unfortunate that this brief time period is affected, it is a minor setback and
we ask everyone to be positive and to move forward from this.  With the
implementation of our daily maintenance window we can minimize the impact of
future issues.

postgresql pg_toast corruption for other instance maintainers
In our case pg_toast corruption affected the complicated 'accounts' table.  Here
are some notes that may help you if you see log messages like:

ERROR:  missing chunk number 0 for toast value N in pg_toast_N

When this corruption occurs, it is not possible to run a cull or even successfully
dump the table.  Identifying the row (and in our case, the column) can be
tricky for larger/older instances.

See https://gist.github.com/supix/80f9a6111dc954cf38ee99b9dedf187a 
for notes on tracking things down.

The column in our case was fortunately 'note' in accounts, so while the link
above suggests

 psql> delete from mytable where id = 'N'; 

all we really needed to do was:

 psql> update accounts set note = '' where id = 'N';

The above URL has both a shell script and a perl script to run a select against
every row to identify which row is affected by the pg_toast corruption. 
Thankfully for mastodon if you enable postgresql logging during a tootctl 
accounts cull the id will be logged.  You can confirm you've got the right id
by doing a:

 psql> select * from accounts where id = 'N';

and you should see:

 missing chunk number 0 for toast value N in pg_toast_N 

From there, do a simple select against each individual column in the table where
id = 'N' and update where you can.