Hi,

reading here and there I’m getting scared and scared about bit rot or similar problems (firmware error? ssd nand ruined?): the disks seems to work fine, but data might be corrupted and so (my) easily backup strategy - that includes two rsynced copies in different places and time, one on ssd and another on hdd - cannot be enough. Please note that all my personal data are on ext4 filesystems and they are less than 1 TB (ok, it’s not a datahoarding size bu this is a sub where theme-related experts are). Maybe the probability is low, and the probability that a critical file is impacted is lowest, but you know Murphy? I do.

Now, the gold solution should be to replace all of my physical servers with others that support ECC ram; then I’ll have to buy at least 3 CMR-disks for building a ZFS raid or a btrfs similar one. Actually this solution is not sustainable because of time, space and cost: so I have to accept the risk to a second best solution… but which? I also would like to avoid the use of other (just optical) media type.

For example, using a backup tool - restic/kopia or proxmox backup server - might riduce the risk? I say so because of an incremental approach might allow me to restore data at selected point in the past. Of course, I have no way to find that point in the past and, moreover, i will lost all data produced after the time point. Maybe I could apply this strategy just to a subset of very critical and immutable data (official documents)? Or, for these documents, I could just use rsync with the checksum option?

As usual, thanks for any suggestion!

  • bobj33@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    copy / paste of my previous post

    Silent bit rot where a bit flips but there is no hardware is extremely rare. My stats say once a year on 300TB of data. Some statistics major can correct me but if someone has 1TB of data then they should see a single bit flip in 300 years so maybe their great great great grandchildren will see it and report back to them in a time machine.

    All of my data is on ordinary ext4 hard drives. I buy all my drives in groups of 3. I have my local file server, local backup, and remote backup. I have 2 drives in the local file server dedicated for snapraid parity and run “snapraid sync” every night.

    https://www.snapraid.it

    Snapraid has a data scrub feature. I run that once every 6 months to verify that my primary copy of my data in my file server is still correct.

    Then I run cshatag on every file when generates SHA256 checksums and stores them as ext4 extended attribute metadata. It compares the stored checksum and stored timestamp and if any file has changed but the timestamp wasn’t edited it reports it as corrupt.

    https://github.com/rfjakob/cshatag

    Then I use rsync -RHXva when I make my backups via rsync of all my media drives. This data is almost never modifed, just new files are added. The -X option is to also copy over the extended attribute metadata. Then I run the same cshatag file on the local backup and remote backup server. This takes about 1 day to run. On literally 90 million files across 300TB it finds a single file about once a year that has been silently corrupted. I have 2 other copies that match each other so I overwrite the bad file with one of the good copies.

    I only run rsnapshot on /home because that is where my frequently changing files are. The other 99% of my data is maybe “write only” so I just use rsync from the main file server to the two backups. Before I run rsync for real I use rsync --dry-run to show what WOULD change but it doesn’t do anything. If I see the files I expect to be written then I run it for real. If I were to see thousands of files that would be changed I would stop and investigate. Was this a cryptolocker virus that updated thousands of files?

    As for backing up the operating system I have the /etc and /root account backed up every hour through rsnapshot along with /home

    I’m not running a business. I can reinstall Linux in 15 minutes on a new SSD and copy over the handful of files I need from the /etc backup

    • wireless82@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Man… thanks! You are a really Master! It is not clear to me the further steps after the snapraid activities but I have to read it with more attention, I think.