Skip to content

Observability

Good LLM Observability is just plain observability

In this post, I aim to demystify the concept of LLM observability. I'll illustrate how everyday tools employed in system monitoring and debugging can be effectively harnessed to enhance AI agents. Using Open Telemetry, we'll delve into creating comprehensive telemetry for intricate agent actions, spanning from question answering to autonomous decision-making.

If you want to learn about my consulting practice check out my services page. If you're interested in working together please reach out to me via email

What is Open Telemetry?

Essentially, Open Telemetry comprises a suite of APIs, tools, and SDKs that facilitate the creation, collection, and exportation of telemetry data (such as metrics, logs, and traces). This data is crucial for analyzing and understanding the performance and behavior of software applications.

Recommendations with Flight at Stitch Fix

As a data scientist at Stitch Fix, I faced the challenge of adapting recommendation code for real-time systems. With the absence of standardization and proper performance testing, tracing, and logging, building reliable systems was a struggle.

To tackle these problems, I created Flight – a framework that acts as a semantic bridge and integrates multiple systems within Stitch Fix. It provides modular operator classes for data scientists to develop, and offers three levels of user experience.