So, just a light post, I upgraded my Pi4 last night and found the Linux firmware breaks a 32bit install.
I’ve been meaning to change to 64bit for months, but as it’s my DMZ box for torrents, radicale, etc, then it’s just finding the right time to convert an adhoc setup into my ansible scripts.
Luckily I had a SD backup from September to get it running again
So, what have you broken over the holidays?
I was trying to finalize a backup device to gift to my dad over Christmas. We’re planning to use each other for offsite backup, and save on the cloud costs, while providing a bridge to each other’s networks to get access to services we don’t want to advertise publicly.
It is a Beelink ME Mini running arch, btrfs on luks for the os on the emmc storage and the fTPM handling the decryption automatically.
I have built a few similar boxes since and migrated the build over to ansible, but this one was the proving ground and template for them. It was missing some of the other improvements I had built in to the deployed boxes, notably:
- zfs on luks on the NVMe drives
- the linux-lts kernel (zfs compatibility)
- UKI for the secureboot setup
I don’t know what possessed me, but I decided that the question marks and tasks I had in my original build documentation should be investigated as I did it up, I was hoping to export some more specific configuration to ansible to the other boxes once done. I was going to migrate manually to learn some lessons.
I wasn’t sure about bothering with UKI. I wanted zfs running, and that meant moving to the linux-lts kernel package for arch.
Given systemd-boot’s superior (at current time) support for owner keys, boot time unlocking and direct efi boot, I’ve been using that. However, it works differently if you use plain kernels, compared to if you use UKI. Plain kernels use a loader file to point to the correct locations for the initramfs and the kernel, which existed on this box.
I installed the linux-lts package, all good. I removed the linux kernel package, and something in the pacman hooks failed. The autosigning process for the secure-boot setup couldn’t find the old kernel files when it regenerated my initramfs, but happily signed the new lts ones. Cool, I thought, I’ll remove the old ones from the database, and re-enroll my os drive with systemd-cryotenroll after booting on the new kernel (the PCRs I’m using would be different on a new kernel, so auto-decrypt wouldn’t work anyway.)
So, just to be sure, I regenerated my initram and kernel with mkinitcpio -p linux-lts, everything worked fine, and rebooted. I was greeted with:
Reboot to firmware settingsas my only boot option. Sigh.
Still, I was determined to learn something from this. After a good long while of reading the arch wiki and mucking about with bootctl (PITA in a live CD booted system) I thought about checking my other machines. I was hoping to find a bootctl loader entry that matched the lts kernel I had on other machines, and copy it to this machine to at least prove to myself that I had sussed the problem.
After checking, I realised no other newer machine had a loader configuration actually specifying where the kernel and initram were. I was so lost. How the fuck is any of this working?
Well, it turns out, if you have UKI set up, as described, it bundles all the major bits together like the kernel, microcode, initram and boot config options in to one direct efi-bootable file. Which is automatically detected by bootctl when installed correctly. All my other machines had UKI set up and I’d forgotten. That was how it was working. Unfortunately, I had used archinstall for setting up UKI, and I had no idea how it was doing it. There was a line in my docs literally telling me to go check this out before it bit me in the ass…
…
- [x] figure out what makes uki from archinstall work ✅ 2025-09-19
- It was systemd-ukify
…
So, after that sidetrack, I did actually prove that the kernel could be described in that bootctl loader entry, then I was able to figure out how I’d done the UKI piece in the other machines, and applied it to this one, so it matched and updated my docs…
…
- IT WASN’T ukify
UKI configuration is in mkinitcpio default configs, but needs changing to make it work.
vim /etc/mkinitcpio.d/linux-lts.preset…
Turns out my Christmas wish came true, I learned I need to keep better notes.
I attempted to move my whole 192.168.x network without vlan to a new 10.x network with vlans. I am still tracking down services and devices where I hardcoded in the old 192.168. ip adresses.
Nothing broken yet, but there’s still time! So far I set up immich instead of seafile for photos (keeping seafile instead of next cloud for files, but immich is way better for photos) And set up link warden and floccus for book mark backup and sync.
I have had some interesting DNS issues though where the immich app would not reliably resolve my immich local domain from the pihole, so of course there’s a DNS issue… Working around that by using the IP for now, it seems to be an issue only with the app.
I managed not to screw anything up, but I was handed a HDD from a friend of mine who is a burgeoning photographer. The drive has crashed, and I am afraid that, unless he coughs up several thousands of $$ for a professional recovery service, I am not going to be able to resurrect his drive. I’ve told him for at least a year to spend the money and get a nas with a Raid set up. So, over Christmas, he did purchase one. But…too little too late for the portable drive. I always hate delivering bad news, but it is a hard lesson to learn. Usually, it just takes one time, and it’s back up city from there on out. Fortunately he has partial backups on SD chips, and files spread from FB to family phones he can recoup some of his losses from.
I’ve had some luck with portable drives by removing the drive from enclosure and attaching it directly to sata-bus instead of USB. Also, as a general rule for anyone who might stumble on this, whenever attempting recovery at first create an image (I use ddrescue) and work with that. That way you’ll minimize risk of causing even more damage.
A while ago we “fixed” couple of hard drives with my brother. All of them had a single faulty diode, apparently it was a known failure point on those drives and brother found instructions online how to bypass that diode. Obviously that doesn’t really fix the drives, but a small piece of wire and some soldering was enough to get drives spinning again long enough that he could copy data over to new drives.
Uggh, feel bad for them.
I’ve tried for years to get friends and family to have their data sit in a single point in the house and use backup services. That would be a massive improvement.
Family won’t listen, so I’m building minicomputers for them all that will handle it. Just have to configure their devices to store data there.
I rebooted a machine that had an Nvidia driver crash. It didn’t finish the reboot, but it did seem to stop sshd so I can’t get in until I get home and have physical access. I had a ping running so I could watch, it never dropped. 🫠
Lesson here: put your experimental boxes on a WiFi plug and set them to boot on AC restoration.
I hate it that systemd is so quick to shut down sshd when shutting down the system. it does that in the very first “round”, while it could really just keep it running till the end…
Even my proxmox web interface lasts deeper into a reboot than sshd. It boots you immediately while you can watch the vms shut down for the next 3 minutes in the browser.
or something like nanokvm
It’s been a year, are these issues fixed?
https://www.jeffgeerling.com/blog/2024/sipeed-nanokvm-risc-v-stick-on
Without enough time to fix, if anything breaks and without proper backup, I upgraded my mother in law’s phone from lineageOS 21 to 23. without issues! Phew
Bold move, Cotton!
(Not really, Lineage updates are the most seamless I’ve ever seen).
I don’t do upgrades (well, not in the sense most people think of them).
My approach is that upgrades are too risky, things always break. It’s also why I don’t permit auto updates on anything. I’d rather do manual updates than dedicated time. Keeping things working is more important, and I have backups.
I run everything virtualized (as much as I can), so I can test upgrades by cloning a system and upgrading the clone. If that fails, I simply build a new system based on some templates I keep. Run in parallel, copy config and data as best I can, then migrate. Just migrated my Jellyfin setup this way.
This is a common methodology in enterprise, which virtualization makes a lot easier for us self hosters.
I haven’t had a disruption from updates/upgrades in 5 years.
It’s not common in enterprise to not auto-update.
Depends on the company and the system. Some of them need to be done off-hours while people aren’t using them. Some are HA and/or insignificant enough that you can do them any time without interruption.
I attempted to update Nextcloud and failed spectacularly. Need to rebuild our family calendars now.
Of all the nightmare rebuilds, nextcloud scares me the most. I have backup images of that machine laying around like drink coasters.
I’ve had next cloud break on me before and it runs slow, so I switched to Seafile instead for files, but actually Seafile scares me even more than next cloud because at least next cloud saves files on disk as files you can copy out to somewhere else if you need to access them (I’m not above emergency scp of important files to get the files I need).
Seafile uses some binary format that means I can only get files in and out through the Web interface. If Seafile breaks, I’m SOL to recover data to somewhere else and need to be able to get a working backup or fix it. I can’t just scp files to a local machine to work on them.



