Holiday Upgrade Disasters

SayCyberOnceMore@feddit.uk · 24 days ago

Holiday Upgrade Disasters

B0rax@feddit.org · 24 days ago

I attempted to move my whole 192.168.x network without vlan to a new 10.x network with vlans. I am still tracking down services and devices where I hardcoded in the old 192.168. ip adresses.

Victor@lemmy.world · edit-2 23 days ago

Is there a technical advantage to using a 192 network vs a 10 network like you described? I would’ve thought they’re just addresses, still IPv4 as well.

I tend to use hostnames where possible. Maybe that’s not viable for your situation?

B0rax@feddit.org · 23 days ago

No. But segregation into vlans has advantages. As this comes with new adresses anyway, might as well tidy up the adress space entirely.

In the end 10.20.20.10 feels much neater than 192.168.174.10.

But yes, you are right, technically the 192.168.xx.yy adress space works the same and has plenty of space for home use.

Victor@lemmy.world · edit-2 23 days ago

Ah okay, so it’s kinda just for aesthetic reasons mostly? I’ll take that explanation home any day 😄

B0rax@feddit.org · edit-2 23 days ago

Exactly. You could say it was unnecessary, but I think we have crossed that line a long time ago.

Victor@lemmy.world · 22 days ago

😁 “Maaaybe”

Victor@lemmy.world · 22 days ago

Going back a little bit, you mentioned advantages to “segregating into vlans”?

Would you like to elaborate on some of those advantages?

B0rax@feddit.org · 22 days ago

I am certainly not an expert by any stretch. But here are my reasons: Ability to isolate some “sketchy” IOT devices into an IOT only vlan, where they are not allowed to access the rest of the network, only the internet and incoming traffic from the other vlans. Having a “clean” vlan/subnet for servers and services where I can give out static IPs without worrying about collisions with client devices

Victor@lemmy.world · 22 days ago

Ah, nice. Very cool, very reasonable.

And you can do this all with a consumer grade router maybe? Or do you need to have like a small PC-like device running special software that acts like a router, that handles this?

B0rax@feddit.org · 22 days ago

I have a Unifi router (Unifi cloud Gateway Fiber to be precise), which one you could argue is on the higher end of consumer hardware. But there are also more consumer oriented routers with that capability.

Victor@lemmy.world · 22 days ago

Cool, thank you so much for the info, mate!

lemming741@lemmy.world · 24 days ago

I rebooted a machine that had an Nvidia driver crash. It didn’t finish the reboot, but it did seem to stop sshd so I can’t get in until I get home and have physical access. I had a ping running so I could watch, it never dropped. 🫠

Lesson here: put your experimental boxes on a WiFi plug and set them to boot on AC restoration.

WhyJiffie@sh.itjust.works · 24 days ago

I hate it that systemd is so quick to shut down sshd when shutting down the system. it does that in the very first “round”, while it could really just keep it running till the end…

lemming741@lemmy.world · 24 days ago

Even my proxmox web interface lasts deeper into a reboot than sshd. It boots you immediately while you can watch the vms shut down for the next 3 minutes in the browser.

Dhs92@piefed.social · 24 days ago

This is why I got a JetKVM heh

HelloRoot@lemy.lol · 24 days ago

or something like nanokvm

lemming741@lemmy.world · 24 days ago

It’s been a year, are these issues fixed?

https://www.jeffgeerling.com/blog/2024/sipeed-nanokvm-risc-v-stick-on

irmadlad@lemmy.world · edit-2 24 days ago

I managed not to screw anything up, but I was handed a HDD from a friend of mine who is a burgeoning photographer. The drive has crashed, and I am afraid that, unless he coughs up several thousands of $$ for a professional recovery service, I am not going to be able to resurrect his drive. I’ve told him for at least a year to spend the money and get a nas with a Raid set up. So, over Christmas, he did purchase one. But…too little too late for the portable drive. I always hate delivering bad news, but it is a hard lesson to learn. Usually, it just takes one time, and it’s back up city from there on out. Fortunately he has partial backups on SD chips, and files spread from FB to family phones he can recoup some of his losses from.

Onomatopoeia@lemmy.cafe · edit-2 21 days ago

Uggh, feel bad for them.

I’ve tried for years to get friends and family to have their data sit in a single point in the house and use backup services. That would be a massive improvement.

Family won’t listen, so I’m building minicomputers for them all that will handle it. Just have to configure their devices to store data there.

This started because one sibling asked about transferring photos from a phone, and I started documenting how to use Resilio and Syncthing.

SayCyberOnceMore@feddit.uk · 24 days ago

Just a friendly word of caution:

if they don’t appreciate what you’re telling them to do, … and if the minis you’re building fail to do some magic data protection that they / you hadn’t thought about… it’ll be your “fault”

They need to take some ownership

Onomatopoeia@lemmy.cafe · 21 days ago

Oh, agreed.

There’s some other stuff at play with the minis (shared family photos, backup to each other, etc) that I’m going to use as an enticement to get them to learn to use these tools.

Once they learn that, I can slip in some other things, piecemeal, depending on what each person clicks with.

IsoKiero@sopuli.xyz · 24 days ago

I’ve had some luck with portable drives by removing the drive from enclosure and attaching it directly to sata-bus instead of USB. Also, as a general rule for anyone who might stumble on this, whenever attempting recovery at first create an image (I use ddrescue) and work with that. That way you’ll minimize risk of causing even more damage.

A while ago we “fixed” couple of hard drives with my brother. All of them had a single faulty diode, apparently it was a known failure point on those drives and brother found instructions online how to bypass that diode. Obviously that doesn’t really fix the drives, but a small piece of wire and some soldering was enough to get drives spinning again long enough that he could copy data over to new drives.

irmadlad@lemmy.world · 24 days ago

I’ve had some luck with portable drives by removing the drive from enclosure and attaching it directly to sata-bus instead of USB

I did try removing it from the enclosure in hopes to hook it to a USB3.0 to IDE/SATA which also includes legacy stuff. However this drive (HD Passport) has the micro-b soldered onto the drive board. I’ve tried several different micro-b to whatever connections, but no joy. The drive won’t initialize and reports a fatal hardware error when I try. When initially plugged in, you can physically feel the platter spin momentarily, and the power light comes on. But the platter will stop spinning and the power light will start blinking on and off. This drive has been beat up, dropped, etc, in a camera gear bag. I’m actually surprised it hasn’t failed before now.

IsoKiero@sopuli.xyz · 24 days ago

If it tries to start but doesn’t do anything it’s pretty much a lost cause then as the drive gets power but fails to initialize. In theory a simple broken solder joint somewhere might cause that and that might be fixable, but that requires at least somewhat decent soldering station and some experience. Or maybe you could get a donor board and swap out memory chips from the old one, but that’s even more tricky. Hopefully it’s not too expensive lesson.

med@sh.itjust.works · edit-2 24 days ago

I was trying to finalize a backup device to gift to my dad over Christmas. We’re planning to use each other for offsite backup, and save on the cloud costs, while providing a bridge to each other’s networks to get access to services we don’t want to advertise publicly.

It is a Beelink ME Mini running arch, btrfs on luks for the os on the emmc storage and the fTPM handling the decryption automatically.

I have built a few similar boxes since and migrated the build over to ansible, but this one was the proving ground and template for them. It was missing some of the other improvements I had built in to the deployed boxes, notably:

zfs on luks on the NVMe drives
the linux-lts kernel (zfs compatibility)
UKI for the secureboot setup

I don’t know what possessed me, but I decided that the question marks and tasks I had in my original build documentation should be investigated as I did it up, I was hoping to export some more specific configuration to ansible to the other boxes once done. I was going to migrate manually to learn some lessons.

I wasn’t sure about bothering with UKI. I wanted zfs running, and that meant moving to the linux-lts kernel package for arch.

Given systemd-boot’s superior (at current time) support for owner keys, boot time unlocking and direct efi boot, I’ve been using that. However, it works differently if you use plain kernels, compared to if you use UKI. Plain kernels use a loader file to point to the correct locations for the initramfs and the kernel, which existed on this box.

I installed the linux-lts package, all good. I removed the linux kernel package, and something in the pacman hooks failed. The autosigning process for the secure-boot setup couldn’t find the old kernel files when it regenerated my initramfs, but happily signed the new lts ones. Cool, I thought, I’ll remove the old ones from the database, and re-enroll my os drive with systemd-cryotenroll after booting on the new kernel (the PCRs I’m using would be different on a new kernel, so auto-decrypt wouldn’t work anyway.)

So, just to be sure, I regenerated my initram and kernel with mkinitcpio -p linux-lts, everything worked fine, and rebooted. I was greeted with:

Reboot to firmware settings

as my only boot option. Sigh.

Still, I was determined to learn something from this. After a good long while of reading the arch wiki and mucking about with bootctl (PITA in a live CD booted system) I thought about checking my other machines. I was hoping to find a bootctl loader entry that matched the lts kernel I had on other machines, and copy it to this machine to at least prove to myself that I had sussed the problem.

After checking, I realised no other newer machine had a loader configuration actually specifying where the kernel and initram were. I was so lost. How the fuck is any of this working?

Well, it turns out, if you have UKI set up, as described, it bundles all the major bits together like the kernel, microcode, initram and boot config options in to one direct efi-bootable file. Which is automatically detected by bootctl when installed correctly. All my other machines had UKI set up and I’d forgotten. That was how it was working. Unfortunately, I had used archinstall for setting up UKI, and I had no idea how it was doing it. There was a line in my docs literally telling me to go check this out before it bit me in the ass…

…

[x] figure out what makes uki from archinstall work ✅ 2025-09-19
It was systemd-ukify

…

So, after that sidetrack, I did actually prove that the kernel could be described in that bootctl loader entry, then I was able to figure out how I’d done the UKI piece in the other machines, and applied it to this one, so it matched and updated my docs…

…

IT WASN’T ukify

UKI configuration is in mkinitcpio default configs, but needs changing to make it work.

vim /etc/mkinitcpio.d/linux-lts.preset

…

Turns out my Christmas wish came true, I learned I need to keep better notes.

SayCyberOnceMore@feddit.uk · 24 days ago

Ah, yes, half-finished notes… had similar issues in the past too.

But it’s a great feeling when you finally understand something, update the notes (and maybe the Arch wiki ) and sleep a little better…

SigHunter@discuss.tchncs.de · 24 days ago

Without enough time to fix, if anything breaks and without proper backup, I upgraded my mother in law’s phone from lineageOS 21 to 23. without issues! Phew

Onomatopoeia@lemmy.cafe · 24 days ago

Bold move, Cotton!

(Not really, Lineage updates are the most seamless I’ve ever seen).

ExcessShiv@lemmy.dbzer0.com · 24 days ago

Lineage updates are the most seamless I’ve ever seen

Really!? I always had to fiddle for an eternity with magisk to get things working again when I updated lineage. It drove my nuts to the point of me avoiding updating.

Onomatopoeia@lemmy.cafe · edit-2 21 days ago

To be fair, you’re talking about root - which is always tricky.

I run rooted Pixels, and so far updates haven’t been a problem.

So far…

avguser@lemmy.world · 24 days ago

I attempted to update Nextcloud and failed spectacularly. Need to rebuild our family calendars now.

lemming741@lemmy.world · 24 days ago

Of all the nightmare rebuilds, nextcloud scares me the most. I have backup images of that machine laying around like drink coasters.

Coolcoder360@lemmy.world · 24 days ago

I’ve had next cloud break on me before and it runs slow, so I switched to Seafile instead for files, but actually Seafile scares me even more than next cloud because at least next cloud saves files on disk as files you can copy out to somewhere else if you need to access them (I’m not above emergency scp of important files to get the files I need).

Seafile uses some binary format that means I can only get files in and out through the Web interface. If Seafile breaks, I’m SOL to recover data to somewhere else and need to be able to get a working backup or fix it. I can’t just scp files to a local machine to work on them.

SayCyberOnceMore@feddit.uk · 24 days ago

Look into radicale if that’s you’re using NC as a DAV server - and everyone’s using their phone as a client

It’s so simple & lightweight (but admittedly the webgui is admin only - no visible calendar)

piyuv@lemmy.world · 23 days ago

I’m also postponing upgrading my rpi4 os to 64bit one, although it’d unblock a lot of my small projects since many docker containers don’t support the 32bit os anymore. I’m just very lazy.

SayCyberOnceMore@feddit.uk · 22 days ago

I started getting the base install on a separate SD card yesterday, and realised there’s still loads of things I’d missed in my Ansible script, like reducing journal writes, etc.

So, I just put the old SD card back in and left it until I can look at it again

Small steps…

ragebutt@lemmy.dbzer0.com · edit-2 22 days ago

I finally put my rack on a dedicated 20A breaker and replaced the old ups that was failing with a new (to me) ups that keeps things going much longer

I don’t mind doing electrical work but any time I have to open the panel I get a little nervous. Glad to have this done

Onomatopoeia@lemmy.cafe · edit-2 24 days ago

I don’t do upgrades (well, not in the sense most people think of them).

My approach is that upgrades are too risky, things always break. It’s also why I don’t permit auto updates on anything. I’d rather do manual updates than dedicated time. Keeping things working is more important, and I have backups.

I run everything virtualized (as much as I can), so I can test upgrades by cloning a system and upgrading the clone. If that fails, I simply build a new system based on some templates I keep. Run in parallel, copy config and data as best I can, then migrate. Just migrated my Jellyfin setup this way.

This is a common methodology in enterprise, which virtualization makes a lot easier for us self hosters.

I haven’t had a disruption from updates/upgrades in 5 years.

Brkdncr@lemmy.world · 24 days ago

It’s not common in enterprise to not auto-update.

frongt@lemmy.zip · 24 days ago

Depends on the company and the system. Some of them need to be done off-hours while people aren’t using them. Some are HA and/or insignificant enough that you can do them any time without interruption.

ExcessShiv@lemmy.dbzer0.com · 24 days ago

It’s extremely common…most production lines I’ve ever been to only do manual updates on equipment, if any at all.

SayCyberOnceMore@feddit.uk · 24 days ago

That’s an interesting process… you could improve it with some ansible - if that’s your thing… or use snapshots on the VM(s) and roll back?

That’s kinda what I’m doing with this (physical obviously) Pi… take a full backup now and again… do upgrades… rollback when completely borked.

Coolcoder360@lemmy.world · 24 days ago

Nothing broken yet, but there’s still time! So far I set up immich instead of seafile for photos (keeping seafile instead of next cloud for files, but immich is way better for photos) And set up link warden and floccus for book mark backup and sync.

I have had some interesting DNS issues though where the immich app would not reliably resolve my immich local domain from the pihole, so of course there’s a DNS issue… Working around that by using the IP for now, it seems to be an issue only with the app.

eli@lemmy.world · 21 days ago

At work we have a nearly 2 week moratorium that covers Christmas and New Years. We do zero changes unless something breaks on its own. So everyone can take time off without worrying too much.

So I do the same for my homelab. I’ll spin up new stuff for fun(new docker containers to try out new apps), but I don’t touch my stable stuff. No reboots, no updates, no image pulls, nothing.

SayCyberOnceMore@feddit.uk · edit-2 21 days ago

Nice. Yeah, that’s a great idea for work.

But, for personal stuff, this is often the only time available…

I “had” to free up space (0 bytes free) on a woefully underpowered Win11 laptop for the father-in-law. I swear it was originally Win7, so it’s been upgraded a couple of times, but no, Linux is a step too far for him… crawling Win11 is his wish…

I’m now mid-upgrade for my Mum’s laptop (Mint 21 --> 22), but with a full clonezilla backup image on standby!

Ah, it’s the “holidays”… for some…

eli@lemmy.world · 21 days ago

0 bytes free is a broken environment. So that requires a fix during moratorium IMO.

Mint 21 still has support until 2027, so not exactly needed…but I get it when you only see certain family members during specific times of the year.

I’m just saying doing a full migration from ESXI to Proxmox and having to backup all VMs and import them or recreate and doing this during the holidays…I’d rather just sit on the couch and enjoy family time than be stuck in my garage or glued to my laptop.

Upgrading a family member’s laptop while shooting the shit with everyone while drinking a beer or something is just fine. Don’t need 100% focus, you’re good there man.

SayCyberOnceMore@feddit.uk · 21 days ago

Upgrading a family member’s laptop while shooting the shit with everyone while drinking a beer or something is just fine. Don’t need 100% focus, you’re good there man.

Yep, although I tend to avoid partition resizings whilst on the whisky 😉

Wispy2891@lemmy.world · 22 days ago

I told myself “with this 4 days of holidays i can easily rebuild the btrfs array”. 3 days later it’s still crunching numbers at 50MB/s… (but with all this time passed at that speed the drives would be able to fully rewritten multiple times…)

SayCyberOnceMore@feddit.uk · 22 days ago

And you did a backup first, right? 😉

How big’s the array? If it’s PBs, then I can understnad it, but GBs… I agree, it should be done by now…

… unless it’s still online and being written to by a database…?

Wispy2891@lemmy.world · 22 days ago

Yes. Luckily I did a snapshot then copied on a separate drive using btrfs send.

I shut down everything, there’s no activity… that’s why I feel weird to see all this rebuild activity, yet so many hours passed…

A scrub usually takes less than an hour…