Currently, I run Unraid and have all of my services’ setup there as docker containers. While this is nice and easy to setup initially, it has some major downsides:

  • It’s fragile. Unraid is prone to bugs/crashes with docker that take down my containers. It’s also not resilient so when things break I have to log in and fiddle.
  • It’s mutable. I can’t use any infrastructure-as-code tools like terraform, and configuration sort of just exist in the UI. I can’t really roll back or recover easily.
  • It’s single-node. Everything is tied to my one big server that runs the NAS, but I’d rather have the NAS as a separate fairly low-power appliance and then have a separate machine to handle things like VMs and containers.

So I’m looking ahead and thinking about what the next iteration of my homelab will look like. While I like unraid for the storage stuff, I’m a little tired of wrangling it into a container orchestrator and hypervisor, and I think this year I’ll split that job out to a dedicated machine. I’m comfortable with, and in fact prefer, IaC over fancy UIs and so would love to be able to use terraform or Pulumi or something like that. I would prefer something multi-node, as I want to be able to tie multiple machines together. And I want something that is fault-tolerant, as I host services for friends and family that currently require a lot of manual intervention to fix when they go down.

So the question is: how do you all do this? Kubernetes, docker-compose, Hashicorp Nomad? Do you run k3s, Harvester, or what? I’d love to get an idea of what people are doing and why, so I can get some ideas as to what I might do.

  • Toribor@corndog.social
    link
    fedilink
    English
    arrow-up
    14
    ·
    edit-2
    6 months ago

    In my opinion trying to set up a highly available fault tolerant homelab adds a large amount of unnecessary complexity without an equivalent benefit. It’s good to have redundancy for essential services like DNS, but otherwise I think it’s better to focus on a robust backup and restore process so that if anything goes wrong you can just restore from a backup or start containers on another node.

    I configure and deploy all my applications with Ansible roles. It can programmatically create config files, pass secrets, build or start containers, cycle containers automatically after config changes, basically everything you could need.

    Sure it would be neat if services could fail over automatically but things only ever tend to break when I’m making changes anyway.

    • CubitOom
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      6 months ago

      I would say that if you are going to host it at home then kubenetes is more complex. Bare metal kubernetes control plane management has some pitfalls. But if you were to use a cloud provider like linode or digital ocean and use their kubernetes service, then only real extra complexity is learning how to manage Kubernetes which is minimal.

      There is a decent hardware investment needed to run kubernetes if you want it to be fully HA (which I would argue means it needs to be a minimum of 2 clusters of 3 nodes each on different continents) but you could run a single node cluster with autoscaling at a cloud provider if you don’t need HA. I will say it’s nice not to have to worry about a service failing periodically as it will just transfer to another node in a few seconds automatically.

    • Lem453@lemmy.ca
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 months ago

      This, I used to have a kubernetes setup but how much redudency can you really have at home. Do you have a generator? Multiple Internet lines?

      The fact is most hardware is highly reliable. Having good backups to restore from is all you need and you gain a huge improvement in simplicity which adds reliability in and of itself.

    • nopersonalspace@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 months ago

      Yeah I guess that’s true, I do think the other part about having configs done programatically is a lot more important anyway. If things go down but all it takes to get it back is to re-run the configs from files then it’s not so bad