MJ12

Information. Technology. Control.


Observability Isn’t Just Logs — It’s a Feedback Loop

Categories: [observability], [engineering], [cto-journal]
Tags: [elastic], [akamas], [telemetry], [optimization], [devops], [ai-tuning]

Most people don’t connect observability and autonomous tuning.
To be honest, neither did I — not at first.

When you’ve been around as long as I have, you start seeing trends repeat themselves.
The “monitoring revolution” wasn’t the first. Before observability became a buzzword, we had a decade of “log everything”. Before that, SNMP traps and MRTG graphs.
They were always about watching. Never doing.

Akamas entered our world from a completely different door: performance optimization.
It was built to find optimal configuration values for complex systems — JVM heap sizes, thread pools, container resources — without trial-and-error guesswork.
Nothing about it screamed “observability tool”.

So how did it end up inside our observability stack?
That’s the story.

From “Monitoring” to “Observability” to… Something Else

The typical observability architecture in 2025 looks something like this:

This is fine — until you ask the question:

“So… what do we actually do with this information?”

And the room goes quiet.

The Gap We Kept Falling Into

Every post-incident review went like this:

  1. We saw the problem in a dashboard (yay).
  2. We found the cause in logs or metrics (slower yay).
  3. We made a change to fix it (ugh, manual).

And the truth is — that last step was always manual.
Even if we automated deployment, even if we had IaC, someone had to decide what to change and by how much.
And “someone” was usually guessing based on gut feel.

We were using observability for diagnosis but never prescription.
That’s when I realised we were running a one-way feedback system.

Enter Akamas — But Not For the Reason You Think

Akamas came into the picture on a completely different project: tuning Java workloads for an analytics platform.
The goal was simple: reduce infrastructure costs without killing performance.
Akamas did its job — brilliantly. It ran experiments, adjusted parameters, and found sweet spots we wouldn’t have tried ourselves.

Then something clicked:
Every experiment Akamas ran was based on metrics we already had… from our observability stack.

In other words:

The Observability Feedback Loop — Now With Brains

Here’s what we built:

fdf6b7f0-8f4b-48b4-b5f6-fff9ce6bd75b

  1. Collect: Filebeat + Metricbeat ship structured logs and metrics to Logstash.
  2. Enrich: Logstash adds metadata — service, role, env, region.
  3. Correlate: Elastic links logs, metrics, traces into unified events.
  4. Surface: Dashboards show KPIs tied to business goals (e.g., “Checkout latency p95”).
  5. Analyse & Tune: Akamas reads KPIs and runs tuning studies.
  6. Apply: Winning configs are pushed via CI/CD or Rudder.
  7. Verify: New metrics flow back into Elastic — completing the loop.

How Akamas Was Actually Integrated and Tested

This wasn’t a “flip the switch in production” move.
Akamas was first deployed on a dedicated optimization node in our staging environment.

We configured it to:

The first study ran for 48 hours in staging with real production-like load.
Akamas tried multiple parameter combinations, automatically rolling back changes that degraded latency or increased CPU above thresholds.

Once the best configuration was identified:

Only after this validation did Akamas become part of the live observability feedback loop.

Why This Matters (And Why Nobody Does It)

Most teams treat observability as passive.
It’s there to tell you what went wrong — not to decide what to do next.

By integrating Akamas, we shifted from passive monitoring to active optimization:

Akamas runs targeted experiments with safety constraints.
If an experiment worsens performance, it rolls back automatically.
If it improves, we commit it to production.

A Real Example: Elasticsearch Under Load

We once had a cluster suffering from:

We gave Akamas the following:

Akamas tested combinations over real workloads for 36 hours.
The result:

The Skeptic’s View

I’ll admit — this sounds futuristic.
It’s not common practice in 2025.
Even experienced engineers will ask:

“Why would you let an optimizer near production configs?”

The answer is: because the alternative is manual tuning forever.
And manual tuning is slow, error-prone, and expensive.

The integration isn’t magic — it’s just connecting two things you already have:

The difference is that nobody thinks to connect them.

Lessons Learned

TL;DR

Observability is no longer about staring at dashboards.
It’s about building a system that sees, decides, and acts — without waiting for you.