nderflow@alien.topBtoHomelab@selfhosted.forum•Logging solution that doesn't take a long time to setup / doesn't take a long time to setupEnglish
1·
1 year ago- There are some inherent complexities in the space.
- Lack of network effects around a (missing) standard implementation or protocol.
I think this could have been different. Nagios had a first-comer advantage for Linux, but it mostly stayed centred on network monitoring, while application monitoring is the key thing.
I say application monitoring is the key thing mainly because what we actually want to know is, is the system as a whole functioning correctly? Without positive evidence that the application is succeeding we can’t tell for sure.
FWIW, Prometheus was significantly infuenced by Borgmon; more details, some fun.
AIUI, Prometheus was implemented by a bunch of folks at SoundCloud, some of whom were ex-Google SREs (hi, BR!). Borgmon had some weaknesses (for example, multiple separate templating implementations, for historical reasons) which as far as I know, Prometheus doesn’t share. Today, Prometheus advocates include a number of ex-Google SREs. For example Brian Brazil (see book) who if I recall correctly implemented a Turing Machine emulator in Borgmon.
Today, Borgmon has been largely replaced in Google by Monarch, which addresses quite a few of the pain points of operating Borgmon infrastructure, and even manages to remove some of the complexity. Though less than I thought it would, which leaves me wondering how much of the complexity is simply unavoidable because it’s inherent in the problem space.