(No offense to some popular solutions in the homelab / self hosting community) Why can’t I find a simple logging solution? Most of the popular logging / metric management solutions require, what feels like, a science degree. I just want to setup a logging server, point my servers to it, and maybe configure a few email alerts.

  • SuperQue@alien.topB
    link
    fedilink
    English
    arrow-up
    2
    ·
    8 months ago

    rsyslog/syslog-ng are simple enough.

    But logging and monitoring/alerting are different things. Logging is not great for alerts. You want an actual monitoring system for that.

    Prometheus is not that complicated, has a huge community of integrations, and is extremely flexible for monitoring everything.

  • nderflow@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 months ago
    1. There are some inherent complexities in the space.
    2. Lack of network effects around a (missing) standard implementation or protocol.

    I think this could have been different. Nagios had a first-comer advantage for Linux, but it mostly stayed centred on network monitoring, while application monitoring is the key thing.

    I say application monitoring is the key thing mainly because what we actually want to know is, is the system as a whole functioning correctly? Without positive evidence that the application is succeeding we can’t tell for sure.

    • SuperQue@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      8 months ago

      I say application monitoring is the key thing mainly because what we actually want to know is, is the system as a whole functioning correctly? Without positive evidence that the application is succeeding we can’t tell for sure.

      So true.

      The problem was at the time when Nagios was introduced the only metrics system we had was SNMP. SNMP was kinda ok to get into with the introduction of Cacti. But due to the complication of ASN.1, the split between the actual data and the MIBs, and the lack of good documentation on how to implement it properly, it was not a good solution to extend. Adding application metrics in SNMP? Good luck with that.

      Thankfully then came statsd, and then Prometheus, to make it much easier to implement application monitoring, as well as integrate system and network monitoring with application monitoring.

      • nderflow@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        8 months ago

        FWIW, Prometheus was significantly infuenced by Borgmon; more details, some fun.

        AIUI, Prometheus was implemented by a bunch of folks at SoundCloud, some of whom were ex-Google SREs (hi, BR!). Borgmon had some weaknesses (for example, multiple separate templating implementations, for historical reasons) which as far as I know, Prometheus doesn’t share. Today, Prometheus advocates include a number of ex-Google SREs. For example Brian Brazil (see book) who if I recall correctly implemented a Turing Machine emulator in Borgmon.

        Today, Borgmon has been largely replaced in Google by Monarch, which addresses quite a few of the pain points of operating Borgmon infrastructure, and even manages to remove some of the complexity. Though less than I thought it would, which leaves me wondering how much of the complexity is simply unavoidable because it’s inherent in the problem space.