Kubernetes? docker-compose? How should I organize my container services in 2024?

nopersonalspace@lemmy.world · 6 months ago

Kubernetes? docker-compose? How should I organize my container services in 2024?

Nico@r.dcotta.eu · 6 months ago

The problem with using seaweedfs to a back your DBs is more on the filesystem than the implementations of POSIX features. When you are writing to a file, and the connection to seaweedfs breaks (container restart, wifi, you name it), then you might end up with a half-written file. If you upload pictures, this is unlikely, but DBs are doing several writes per second usually. So it is more likely one of those gets interrupted. In my case, my grafana sqlite DB would get corrupted every other week.

What I recommend is using DBs natively in your node’s filesystem, and backing them up to seaweedfs periodically instead. That way your DBs ‘work’ but you can get them running again, and the backup is replicated in the distributed filesystem.

nopersonalspace@lemmy.world · 6 months ago

That’s an interesting issue. Do you think the problem would be the same for any CSI plugin? I’m thinking of using my NAS as the storage brains of the operation and hooking it up with NFS or something, but would that have issues with stateful stuff like DB’s too?

Nico@r.dcotta.eu · 6 months ago

I have never used NFS, but I think it would fare much better than seaweedfs because it uses Fuse to implement CSI. So for NFS I am sure the protocol would consider half-assed writes

would be the same for any CSI plugin

No, it would depend on the CSI plugin and how it is implemented. Ceph for example I know it has several, and cloud providers offer CSI volumes for their block storage (AWS EBS, GCP PD), and they will all perform differently. See this comment from a seaweedfs issue:

[…] It is always better to run databases on host volumes if you can (or on volumes provided by AWS EBS or similar). But with Seaweedfs especially if you are running postgres with seaweedfs-csi volume be prepared for data corruption. Seaweefs-csi uses FUSE, if anything happens to seaweedfs-csi (Nomad client restart, docker restart, OOM) mount will be lost and data corruption will happen.

Running on CEPH (since CEPH CSI using Kernel driver not FUSE) is acceptable if you fine with low TPS.

I found it was easier to make recoverable, backed up, host volumes than to make DBs run on high availability filesystems like seaweedfs (I admit I have not tried Ceph - the deployment looked a bit complicated/overkill for a homelab).

Postgres and sqlite are just not made for that environment. To run a high-availability DB, it is better to run a distributed DB made for that (think etcd, cassandra) than to run a non-distributed DB on top of a distributed filesystem.

Good luck! :)

johntash@eviltoast.org · 6 months ago

What I do right now is I have a rclone sidecar container that uploads files in a directory every few seconds, and I also have another init sidecar that runs before the main application and downloads those files (incl sqlite dbs) to the normal disk. This works okay but feels pretty clunky and can still result in stuff getting corrupted because I’m just backing up the db files and not using any sqlite commands to actually back up the db to another file that isn’t in-use first.

How do you handle a job going from one nomad node to another? Or do you pin jobs like grafana to specific hosts?

Nico@r.dcotta.eu · 6 months ago

Nomad has host volumes - so you can tell it to mount a folder from the machine on the container, and it will only schedule that container on machines that have that folder. So yes, effectively you pin the workload, thus introducing a SPOF - I do not love it but Grafana only supports sqlite and postgres, so making those HA would require failover setups which is a bit much for a homelab :')

For backing up, you can use the sqlite command periodically (do cron job or Nomad periodic job) and then upload the backup to some external, safe storage (could be seaweedfs or S3!). For postgres you can use something like this.