Optimizing Observability with Mezmo

[Originally published on LinkedIn 29 May 2024

Tucker Callaway, Mezmo‘s CEO, presents about observability and telemetry data at AppDev Tech Field Day 1.

Mezmo has been solving observability problems since 2015 and its core principles are to help organizations use telemetry data to understand, optimize, and respond.

  • Understand — data profiling, aggregation, pattern detection, and identification of best opportunities for optimization and insight.
  • Optimize — data reduction, enrichment, transformation, and routing to appropriate destination.
  • Respond — search, analyze, adjust to and react to changing context.

Mezmo views itself as a data platform rather than an observability platform. And that’s because in the enterprise, organizations will have a plethora of observability solutions in play, generating massive volumes of telemetry data.

Something very important to keep in mind: one of the core challenges is that telemetry data is different that business data in that the cost of telemetry data increases linearly with volume, but value does not.

Thus, cost reduction is especially important and is a top use case of observability optimization (but not a driver for organizations to start using observability pipelines).

Other use cases include compliance, business insights, data orchestration, acceleration of problem resolution time (MTTD/R), and, of course, security.

It’s super important to Mezmo that you can leverage its SaaS control plane while keeping your data on-premises (or, I assume, in your own cloud or CSP environment.)

Mezmo’s capabilities include data profiling, in stream analysis, pipeline operations, and log analysis, as well as simulation and testing of data operations in real time.

Mezmo understands that the developer wants to focus on app functionality, not instrumentation. Instrumentation, of course, is insurance, helping us to understand what’s happening when things go wrong. Thus, the mantra that it’s better to have and not need the telemetry data rather than need and not have the data.

However, instrumentation can significantly increase costs – more instrumentation, more costs. And, unfortunately, the amount of instrumentation is decided by the developers during design and coding, not at run time. Run-time decision making is usually limited to which log levels (info, warning, error, critical, …) to output and/or capture.

Mezmo says that the solution to the ever-increasing volume and velocity of telemetry data is telemetry pipelines. Responsive pipelines can understand and respond automatically – discarding unneeded data or capturing additional information.

Because everything is dynamic, pipelines can be updated dynamically without restarting applications or services.

Bill Meyer‘s demo showed that Mezmo uses a “low-code” flowcharting interface to build and control the pipeline process. It’s clearly simple and easy to both build a pipeline and to understand exactly what the pipeline is doing.

Mezmo provides a lot of data processing options. For example, in addition to typical filtering, you can add a temporal component, such as keep the first day of data after a new app version is deployed.

The SaaS control plane dashboard provides telemetry on data volumes, and you can “tap” into a pipeline to peak into real time data.

So how does Mezmo help developers?

First, as I said above, Mezmo is helping developers focus on app functionality rather than on instrumentation.

Second, Mezmo fosters communication and collaboration between DevOps and site reliability engineers (SREs) on the appropriate verbosity of instrumentation.

Third, Mezmo helps DevOps move the instrumentation decision from development time to run time. As Mezmo says, “Feel free to log like no one’s watching!” That’s because operations and SREs can decide what and when to keep or discard data based on dynamic context.

Bonus, for IT, operations, and SREs, Mezmo can help organizations with observability about the plethora of observability and security telemetry in their environment.

What about security?

I’d be remiss if I didn’t mention Mezmo’s glaring omission of security. While it’s obvious that Mezmo can be used to optimize security telemetry, important given the cost of storing and processing SIEM/SOAR data), Mezmo hasn’t addressed its own security: access and control of the control plane, and how it secures access to and the actual telemetry data.

Leave a Comment

Your email address will not be published. Required fields are marked *