Learning Observability Part 1

I am going on a trip down the observability lane. Recently at work, we decided to adopt OpenTelemetry to provide observability to our new services. In this series, I will dive right into setting up not only OpenTelemetry in an application, but the infrastructure necessary to observe the logs, metrics, and traces it produces.

Who is this for?

This is for me, but please read along and you might learn something. If you have no idea what OpenTelemetry is though I suggest you start with their own documentation or a YouTube presentation on the concepts.

These articles are for me to put into practice the concepts I am learning along the way and to document what I did for my future self when I come back to the lab and find Elastic, Jaeger, Prometheus, Grafana, and the OTEL collector deployed.

What are we aiming for?

Even though the lab is composed of some beefy servers, it still has resource constraints. Any of the infrastructure I deploy will not be high availability, use best practices, or even appropriate security. It's simply a lab environment where I plan to practice the plumbing necessary to get the whole end-to-end setup rolling.

While I know services like Elastic have a full Application Performance Monitoring solution I plan to simply use it with Kibana for logging and as a database repository for traces. For trace analysis, I will be using Jaeger. For metrics, I have zero experience with Prometheus so I think it's a great opportunity to set up an OTEL collector exporter there. While I use Grafana dashboards at work, I have never had to set one up. So, Grafana will be used for metrics dashboards and alerts. Lastly, a simple Python Flask application should suffice for practicing manual instrumentation with the OTEL SDKs.

If all works as expected I think a stretch goal will be to deploy the OpenTelemetry Demo which contains a fully instrumented microservices implementation of an Astronomy Shop.

Up next

The next entry in this series will start with a simple single-node Elastic and Kibana deployment. It will include setting up an index for OTEL logs, with an index lifecycle management policy for rolling over the index daily.