Syncthing alternatives

ZeDoTelhado@lemmy.world · 26 days ago

Syncthing alternatives

Avid Amoeba@lemmy.ca · edit-2 25 days ago

That’s really weird. I’ve been using it for mobile-desktop-server-offsite sync for many years, with transfer sizes over 15TB, over WiFi, cellular, cable, fiber. I’ve never seen data corruption. Conflicts, sometimes. Permission issues, sometimes. Wiping something accidentally, sometimes. It’s even more weird because Syncthing performs computes hash values for the files it manages. I don’t know if it performs hash validation after copying remotely but if not, it can be forced manually which would tell you what’s fucked and be pulled from the source, if it still exists.

Nevermind, it verifies the result:

When a block is copied or received from another device, its SHA256 hash is computed and compared with the expected value. If it matches the block is written to a temporary copy of the file, otherwise it is discarded and Syncthing tries to find another source for the block.

According to this, if you have data corruption it can only occur between copying/moving a temporary file on your destination to another directory, or it could occur on the source itself. Both of those scenarios are a cause of concern and would likely persist with any utility. Moving or copying a file from one location to another on a sane machine should not corrupt it. If I were you I’d ensure my server doesn’t eat bits. If not the storage media, it could be bit rot, or bad RAM.

Just in case everything seems fine, let me tell you what I dealt with. I had a Ryzen 5950X machine with 32GB of RAM. It worked well since inception with no signs of RAM or data corruption issues. I test every new machine with Memtest86+. At some point I migrated the storage from Ext4 on LVMRAID to ZFS. All good. Then I wrote an alarm for Prometheus to tell me if there’s any issues in ZFS. A week later I get an email about a ZFS error. I check the system - says checksum errors, data has been corrected, applications unaffected, run a scrub to clear. I ran a scrub. A few more checksum errors found, all corrected, we’re clean now. There was a strong solar storm around that time, probably that. A couple of weeks later I get another email. Same symptoms, same procedure. No solar storm. Shit. Memtest86+ - pass. Hm. A couple of weeks later I get another. Same thing. Memtest again - nothing. This went on for several months. Meanwhile the off-site backup sees nothing like that. While running Memtest on another machine I noticed that the test passes following the first took longer than the first, a lot longer. I thought something might be wrong with that machine. Dug into it, got into Memtest’s source code and discovered that the first pass is shorter on purpose so that it quickly flags obviously bad RAM. Apparently if you want to detect less obvious issues, you have to run multiple passes. OK. Memtest the main server again, pass 1: OK, pass 2: OK, pass 3: OK, pass 4: FAIL. FUCK. Memtest each stick separately for 4 passes: OK. Memtest 2 at a time: OK. Memtest all 4: FAIL. Alright, now we know why ZFS keeps finding checksum errors. Long story short, this machine could not run this RAM in 4-DIMM config. Replaced it with another RAM that’s rated to run in 4-DIMM config on that processor. No more checksum issues. If I was running the older Ext4-on-LVMRAID storage stack, I would have caught NONE of these and it would have happily corrupted files here and there. In fact it likely did and I have some corruption. Moral of the story - run many Memtest passes and use checksumming storage stack like ZFS or Btrfs. I strongly recommend ZFS since its stripe RAID works fine unlike Btrfs’es. If you don’t find bad RAM, start using it today, even if you’re working with a single disk and add redundancy when you can. Only after change Syncthing for something else if you still somehow get corruption without ZFS’es knowledge. And if ZFS tells you that you have checksum errors, you likely have bad hardware.

halcyoncmdr@lemmy.world · 25 days ago

Dug into it, got into Memtest’s source code and discovered that the first pass is shorter on purpose so that it quickly flags obviously bad RAM. Apparently if you want to detect less obvious issues, you have to run multiple passes.

I thought it was common knowledge that Memtest needed to be run for multiple passes to truly verify there are no issues. Seems that’s one of those things that stopped being passed down in the community over the years. Back when I was first learning about overclocking around 2005 that was emphasized HEAVILY, with the recommendation to run it at least overnight, and a minimum of 10 passes.

Avid Amoeba@lemmy.ca · edit-2 25 days ago

It’s kind of embarrassing because I used to work as a service technician at a popular computer store in the 2000s and Memtest86+ has been a standard fare of testing. I guess outside of OC, the shorter first pass truly was enough to spot bad RAM in the vast majority of cases. Plus multichannel interactions were not nearly as prevalent in the DDR1/2/3 days. I recently installed 4 DIMMS for 128GB on an AM5 machine just to discover that the 5600 RAM only boots at 3600 in a 4-DIMM config, as per AMD’s docs. Could force it higher but without extra adjustment it can’t go beyond 4600 on this machine. Back in the day, different DIMMs, often with different chips worked in 2, 4-DIMM configs so long as they matched their JEDEC spec. backinmyday.jpg

halcyoncmdr@lemmy.world · 25 days ago

Yeah AMD’s memory controllers, especially DDR5 seem to have a lot more difficulty at high speed with 4 slots filled. I used to plan upgrades around populating 2 slots and doubling if needed a few years later, instead now you really need to plan to ignore those slots if you are needing memory performance for things like gaming versus raw capacity.

Avid Amoeba@lemmy.ca · 25 days ago

Yeah, I didn’t need 128GB, but as soon as I figured what’s going on with the 4-DIMM config, I ordered another kit to fill what I think I’d need for the lifetime of the system.

halcyoncmdr@lemmy.world · 25 days ago

Similar issues even with just 2 DIMMs with some XMP/EXPO profiles not working on AMD systems because of board/CPU limits. It should technically work, but for whatever reason it just can’t handle it and speeds need to be dropped or the timings loosened a bit even though the RMA itself is rated for that.

Not that the higher speeds are even necessary for 90% of users outside extreme overclocking. DDR5 6000 is basically where you reach diminishing returns anyway, and that’s often where that limit seems to appear.

Avid Amoeba@lemmy.ca · 25 days ago

Ugh. And as far as I’m reading, we’re hitting limits with the connectors and interconnects so the next iteration up might require some type of CAMM interface. 😔

NullPointerException@programming.dev · 24 days ago

The software should inform the user to run atleast 10 passes in the UI

zorflieg@lemmy.world · 25 days ago

This post doesn’t benefit me at all but I love how long it is.

Avid Amoeba@lemmy.ca · edit-2 25 days ago

Let me tell you about diagnosing a reproducible crash on that 5950X system after swapping the RAM with verified good modules. An issue I only discovered because I decided to warm myself using Folding@home for a couple of cold days while my building was switching the central heating on. 😂

ZeDoTelhado@lemmy.world · 25 days ago

That is some good info here. My HDD is totally fine (checked it very recently actually), as for the ram last time I checked was ok, but can check again to be sure

Avid Amoeba@lemmy.ca · 25 days ago

Check my edit.

ZeDoTelhado@lemmy.world · 25 days ago

That is some crazy story right there. I do know for a fact that memtest needs multiple passes. But in my case the machine only has 1 stick of ram (used to have 2, one died). I will probably do a memtest overnight and get at you tomorrow.

halcyoncmdr@lemmy.world · 25 days ago

(used to have 2, one died)

That would make me immediately look to the RAM as the possible source or corruption. If it used to be a matched pair and one stick died, the odds of the other being on its way out are MUCH higher than normal. I would never trust that matched stick.

ZeDoTelhado@lemmy.world · 25 days ago

Finished an all nighter memtest with a total of 12 passes. All good on the ram side

Avid Amoeba@lemmy.ca · edit-2 25 days ago

Condolences, you just switched to Ultra-Violence.

halcyoncmdr@lemmy.world · 25 days ago

Crazy, thought for sure it would fail testing.

Still wouldn’t trust it personally after a failed stick from a matched pair regardless of what the test says though.

Avid Amoeba@lemmy.ca · edit-2 25 days ago

Yeah. But it could be the board that burned it. But yeah, dead RAM is bad news, something is likely up. If I had data corruption and RAM didn’t show errors I’d begin swapping components. If the machine is cheap and swapping components would be too expensive or impractical, I’d swap the machine for another, like a cheap second hand Dell box.