
Building Observability Systems with Podman, Prometheus, and Grafana
Observability for tech teams is like the film room for professional athletes. In the film room, players review opportunities to improve their game.
Core Observability Tools for System Monitoring
Key tools like Grafana, Prometheus, Loki, and Podman play a crucial role in generating metrics for your workloads, contributing significantly to the process of improving observability.
You’ll need to monitor and observe workloads as a systems engineer, site reliability engineer, DevOps engineer, or developer—i.e., anyone in charge of tech systems. You need a film room.
Software engineering teams use observability to understand the health, performance, and status of software systems, including when and why errors occur. - What is Observability? | New Relic
Choosing the Right Observability Stack Choosing which system to use for your environment can be just as challenging because, as with anything in tech, there are many choices and many tradeoffs.
For example, some tools may offer more features but require more resources, while others may be simpler but less powerful. It’s important to consider your specific needs and constraints when making this decision.

By setting up a system to pull metrics from the environment, I aimed to empower the team to improve the overall observability of our work. While some monitoring existed in other environments, this particular one needed our attention more.
Implementation Goals and Requirements
- Improve visibility into existing systems without adding much overhead to the process.
- utilize containers to increase portability and decouple dependencies from the OS
- setup dashboards to observe the metrics in a digestible way
- setup alarms to alert the team if things aren’t happy
- use the metrics to validate assumptions and optimize the architecture of existing systems by updating requirements
Architecture Design with Open Source Tools
The proposed architecture, featuring Prometheus, Grafana, Telegraf, InfluxDB, and a podman, was chosen for its cost-effectiveness, familiarity, and ability to centralize metrics, all of which are key to enhancing observability.

Why Podman for Container Management
Podman was chosen for the container engine as it uses fork-exec architecture, offers SELinux for security, and the ease of running containers rootless. Podman also introduced a new system for managing containers with Systemd called Quadlet.
The Quadlet strategy made sense because it provided a familiar process for engineers on the team who may need to gain container experience. Those engineers can utilize systemd to restart containers if required. A win for the team!

As you see below in the snippet systemctl --user status grafana.service is showing the container status. The expected systemctl commands are the methods to restart or stop the containers.
[monitor@monitors ~]$ systemctl --user status grafana.service --no-pager
● grafana.service - Podman Grafana container
Loaded: loaded (/home/monitor/.config/containers/systemd/grafana.container; generated)
Active: active (running) since Fri 2024-05-24 16:15:47 EDT; 2 weeks 0 days ago
Docs: man:podman-generate-systemd(1)
Main PID: 1826 (conmon)
Tasks: 27 (limit: 48921)
Memory: 346.2M
CPU: 21min 52.337s
CGroup: /user.slice/user-1006.slice/user@1006.service/app.slice/grafana.service
├─libpod-payload-377e44df70a05f120f2bc048f98d84cdd84e3b31375a88ddf0c9187909b1caaf
│ └─1829 grafana server --homepath=/usr/share/grafana --config=/etc/grafana/grafana.ini --packaging=docker cfg:def…
└─runtime
├─1763 rootlessport
├─1789 rootlessport-child
└─1826 /usr/bin/conmon --api-version 1 -c 377e44df70a05f120f2bc048f98d84cdd84e3b31375a88ddf0c9187909b1caaf -u 37…
Configuring Containers with Podman Quadlets
[Unit]
Description=Podman Grafana container
Documentation=man:podman-systemd.unit(5)
[Container]
ContainerName=grafana
Image=docker.io/grafana/grafana
PublishPort=127.0.0.1:3000:3000
User=grafana
Volume=/var/opt/monitors/grafana:/var/lib/grafana:U,Z
Network=monitor-network.network
[Service]
Restart=always
[Install]
WantedBy=multi-user.target default.target
For more info see man podman-systemd.unit
Understanding the Observability Stack Components
- Prometheus collects and stores its metrics as time series data.
- Grafana is a multi-platform open-source analytics and interactive visualization web application. When connected to supported data sources, it can produce charts, graphs, and alerts for the web.
- Telegraf is a server-based agent that collects and sends all metrics and events from databases, systems, and IoT sensors.
- InfluxDB is a database that stores and analyzes time series data from any source in real-time. It offers high performance, low cost, native SQL support, and interoperability with other data systems.
I provisioned the server and deployed the quadlets onto the RHEL-9 server with Ansible. To accelrate our process, I’ve imported a couple of community dashboards found on Grafana’s site, e.g., Dashboard. The community dashboards allow a quick ROI on this project while we develop our custom dashboards.

Conclusion
Implementing observability solutions may seem daunting at first, but with the technology available today, it’s relatively straightforward. There are numerous ways to set up observability, and choosing the right solution may take some time, but rest assured, it’s simple enough.
If a demo on how to set this up is helpful, let me know, and I’ll prepare one.