JulianCalaby<p>Cursed homelab update:</p><p>So server #2 is humming along nicely, however continuing to use the disk that nearly scuttled my repartition / recovery effort was not a good idea.</p><p>Me: creates a Ceph OSD on a known faulty hard disk<br>Faulty hard disk: has read errors causing a set of inconsistent PGs<br>Me: Surprised Pikachu face</p><p>Thankfully this was just read errors, no actual data has been lost.</p><p>So for a brief, glorious moment, I had just under 50TB of raw storage and now it's just under 49TB.</p><p>And for me now, the big question is: do I do complicated partition trickery to work around the bad spots (it's a consecutive set of sectors) or do I junk the disk and live with 1 less TB of raw storage?</p><p>In other news, I know understand a little bit more about Ceph and how it recovers from errors: PGs don't get "fixed" until they are next (deep) scrubbed, which means that if your PGs get stuck undersized, degraded or inconsistent (or any other state) it could be that they're not getting scrubbed.</p><p>So taking the broken OSD on the bad HDD offline immediately caused all but 2 of the inconsistent PGs to get fixed, and the remaining 2 just wouldn't move, so I smashed out a trivial script to deep scrub all PGs and last night, a couple of days after this all went down, one got fixed. Now hopefully the other will get sorted out soon.</p><p>ceph pg ls | awk '{print $1}' | grep '^[[:digit:]]' | xargs -l1 ceph pg deep-scrub</p><p>So read errors -> scrub errors -> inconsistent PGs.</p><p>Then: inconsistent PGs -> successful scrub -> recovery</p><p>What this also means is that while I stopped the latest phase of the Big Copy to (hopefully) protect my data, I think I can start it again with some level of confidence.</p><p><a href="https://social.treehouse.systems/tags/homelab" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>homelab</span></a> <a href="https://social.treehouse.systems/tags/cursed" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>cursed</span></a> <a href="https://social.treehouse.systems/tags/cursedhomelab" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>cursedhomelab</span></a> <a href="https://social.treehouse.systems/tags/ceph" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ceph</span></a></p>