Skip to content

2024

What is prompt optimization?

Prompt optimization is the process of improving the quality of prompts used to generate content. Often by using few shots of context to generate a few examples of the desired output, then refining the prompt to generate more examples of the desired output.

Systematically Improving Your RAG

This article presents a systematic approach to enhancing Retrieval-Augmented Generation (RAG) systems, drawing from insights gained during a discussion with Hamel. It builds upon my previous work, including:

These foundational pieces set the stage for a comprehensive guide on systematically improving RAG applications, offering practical strategies for developers and organizations looking to optimize their systems.

By the end of this post, you'll have a clear understanding of my systematic approach to improving RAG applications for the companies I work with. We'll cover key areas such as:

  • Create synthetic questions and answers to quickly evaluate your system's precision and recall
  • Make sure to combine full-text search and vector search for optimal retrieval
  • Implementing the right user feedback mechanisms to capture specifically what you're interested in studying
  • Use clustering to find segments of queries that have issues, broken down into topics and capabilities
  • Build specific systems to improve capabilities
  • Continuously monitoring, evaluating as real-world data grows

Through this step-by-step runbook, you'll gain practical knowledge on how to incrementally enhance the performance and utility of your RAG applications, unlocking their full potential to deliver exceptional user experiences and drive business value. Let's dive in and explore how to systematically improve your RAG systems together!

RAG Course

If you're looking to deepen your understanding of RAG systems and learn how to systematically improve them, consider enrolling in the Systematically Improving RAG Applications course. This 4-week program covers everything from evaluation techniques to advanced retrieval methods, helping you build a data flywheel for continuous improvement.

RAG (Retrieval-Augmented Generation), is a powerful technique that combines information retrieval with LLMs to provide relevant and accurate responses to user queries. By searching through a large corpus of text and retrieving the most relevant chunks, RAG systems can generate answers that are grounded in factual information.

In this post, we'll explore six key areas where you can focus your efforts to improve your RAG search system. These include using synthetic data for baseline metrics, adding date filters, improving user feedback copy, tracking average cosine distance and Cohere reranking score, incorporating full-text search, and efficiently generating synthetic data for testing.

Losing My Hands

The world was ending, and I couldn't even put my pants on. My hands had cramped up so badly that I couldn't grip a water bottle or type and could barely dress myself. A few weeks earlier, I had been riding the greatest decade-high anyone could have dreamed of. I was moving to New York, making 500k, working for an amazing company, and was engaged in what might be the most lucrative field on the planet. I was doing what I loved, getting paid well, and feeling like I was making a difference. Life was good. Well, as good as it could get during a once-in-a-lifetime pandemic. My name is Jason. I'm a machine learning engineer. And this is how I almost lost my hands.

When COVID-19 hit, I was a Machine Learning Engineer at Stitch Fix. Being remote meant avoiding the worst of the pandemic, which made life easier for me than most. However, as with many others, COVID-19 brought with it less-than-ideal coping mechanisms. While the world was falling apart outside, I was in a cocoon. I felt like I was just locked in and taking my job seriously because I enjoyed the work so much. What I didn’t realize was that I was seriously harming myself. The idea that value was a measurement of the function of hard work, length of work, and economic activity became a madonna that consumed me.

The Aleph and The Zahir

The Argentinian author Jorge Luis Borges wrote of two interlinked concepts, The Aleph and The Zahir. The Aleph allows the observer to see all things, while the Zahir gradually becomes the only thing the observer can see. Not to be too melodramatic, but in a similar motion, work was what allowed me to see the world differently and opened me up to an entirely different library of experience, but eventually became the only thing I was doing.

There would be ~6-week periods where I would wake up and start work around 7 am every morning, then code with few breaks until around 2 am, followed by long rest periods. Even to hardened engineers, keeping up this work rate and style of work is unsustainable, but what else are you going to do during a pandemic? When you’ve been conditioned to believe rightly or wrongly that your value as a human being is derived from the economic value you provide to those around you and all barriers to producing work have been removed by an unprecedented upheaval to social norms, it felt like there was only one path forward and that was working as hard as possible every day. This rat-brained mentality, combined with my binge work style is ultimately what I think led to the severity of my injury.

Another aspect that led to this insane cycle of overwork was that the team I was a part of was going through a lot of upheaval. Teammates were leaving, and I felt like I was left to pick up the slack. I’d like to think I was in control of my work, but consistently logging 12-15 hour days for weeks on end took its toll. At one point, my manager saw my commit history and took me aside, asking me what the fuck I was doing working this much. Imagine that. Your boss telling you that you’re working too hard. Ultimately, it came down to outside of pottery, BJJ and programming; there just wasn’t much else to do. My lifestyle had become a bubble, and when it burst, I came tumbling back to earth.

The loss of my hands came on suddenly and without much warning. One day, I woke up and realized I couldn’t hold my phone properly. I tried to get a glass of water but had the same issue. My hands were stiff and had a restricted range of motion; it was difficult to perform basic tasks. At first, it didn’t seem like a big deal; I just took a few hours off and rested. Maybe I had slept poorly or in an awkward position; maybe I had played too many video games that day. It’s not as if I was the first engineer ever to get pain in their hands, right? But things didn’t get better. Not that day or the next or even the next week. A sort of dread started to creep in as I realized most of the tasks I performed daily were becoming increasingly impossible for me to complete. This dread eventually transformed into an existential one.

The first fear was whether I could ever code again. If I can’t hold my phone, I can’t type. If I can’t type, I can’t work. Which quickly collapsed into If I can’t work hard, where do I derive my value from?

Patriarchal Values and Self-Worth

I've touched on how severely patriarchal value systems affect me and my worldview before, but even being aware of this facet of myself isn't enough to overcome it. It's something that I and imagine many others struggle with constantly. Where do I derive value from, not just as a person, but as a man, if not my ability to work and thus provide for my loved ones? What am I here for if I don't have value?

I slipped into a kind of depression because it was a listless kind of existence. I wasn’t sad per se, but I felt like my course had been rerouted, and I wasn’t sure where I’d end up. I would kind of just wander around New York, coping by going on dates or surrounding myself with non-tech-related people as I tried to get back into a normal routine. This was interspersed by periods of what is, in hindsight, less than optimal behaviours. I would do really stupid shit like go alone to Michelin-star restaurants for lunch or waste my day smoking a bunch of weed. It wasn’t quite a spiral as my life balanced itself out by diving into non-tech hobbies like spending 6-7 hours in Bryant Park playing ping pong, training BJJ, swimming a mile every morning and ultimately learning how to free dive, which helped me for a while to keep my mind off of not being able to work.

I went through acupuncture physiotherapy, tried anything that might work and threw as many resources at my hands as I tried to work through not being able to use them. I even considered peptides, PRP, and stem cells, telling myself even if it was a small fortune, it would be worth it if I could make a living again. All of these therapeutics and treatments helped to some degree, but I still deal with pain and stiffness even three years later. To this day, it affects my ability to cook, eat, get dressed, and say nothing about my hobbies. Even swimming would aggravate my wrists without treating them immediately afterwards. The whole experience of being this helpless is just insane to think about. Since being injured, I’ve hesitated to take on a lot of work despite enjoying it. Which has been the major push for me to shift roles slightly. I’ve turned down basically every offer to join a startup because I’m worried about reinjuring myself. And to be honest, I’m still trying to figure out what it all means. I don’t know if there is some moral or epiphany for me and how I approach work other than trying to be more purposeful with my work. Every time I code now, I have to weigh if what I’m doing is a valuable use of my time and resources. If coding adversely affects my health, it would be better for me not to do it.

I took roughly two years off of work. I wasn’t making much money or doing much programming. What helped was reminding myself that the skills that took me to ‘the dance’ are not the skills that will keep me happy for the rest of my life. You must keep moving and learning new things; otherwise, you will get left behind. In this current wave of AI optimism, I found myself enjoying things again and adapting. Again, I’m still trying to figure out what my injury means, but at any rate, I’m much more resilient now than where I was 3 years ago.

Focusing on Open Source and Consulting

Two things I've done specifically are: 1. Focus more on open source projects so the code I write has more leverage. 2. Pursue consulting as a way to scale myself as an individual while still being able to work with and help founders build exciting new solutions.

This idea that you have control over yourself and your actions and choices and can in some way shape your outcomes through nothing but your own decisions may sound haughty and full of myself, but I really do think it’s important to try and frame things in terms of what you’re able to do. Stop worrying about everyone else and things that are out of your control.

Existentialism and Personal Responsibility

Jean-Paul Sartre said, "The first effect of existentialism is that it puts every man in possession of himself as he is and places the entire responsibility for his existence squarely upon his' own shoulders. And, when we say that man is responsible for himself, we do not mean that he is responsible only for his own individuality but that he is responsible for all men."

I think the first time something really good happens to you—I mean really good—like when you can take a step back from life and breathe and look at it and go, ”Hey, I have it pretty good,” you tell yourself you got lucky. You met the right person, went to the right school, and landed an internship at the right startup; whatever it is, there's a feeling that it's out of your control. But, when you don’t understand nature or luck, you feel it’s impossible to reproduce it again. This was part of how I felt initially, but having gone through everything I’ve gone through over the last ten years or so, I don’t just mean a struggle, but all of my experiences have placed me in a position where I’m much more confident even though my hands still hurt and bother me to this day.

Byung-Chul Han's Insights on the Burnout Society

I've been reading a lot of Byung-Chul Han recently, specifically The Burnout Society; I'll spare you the lecture and just give you the Sparks Notes version graciously provided by Boris Smus.

Byung-Chul Han views contemporary society as no longer a disciplinary society but rather an achievement one. Within this, there are plenty of parallels to ideas like the panopticon and technology being an extension of man ala Marshall McLuhan mediating human behaviour and potentiality, however the ideas I found most relevant to my situation are:

  • Achievement society is a society of self-exploitation.
  • The achievement-subject exploits itself until it burns out.
  • The achievement-subject that understands itself as its own master, as homo liber, turns out to be homo sacer.
  • The achievement-subject is simultaneously perpetrator and victim, master and slave.

Emphasis is mine, and it's because I think this idea is the most impactful of the summaries Smus provided. Am I just my own subject exploiting myself till there is nothing left but a husk where Jason once stood? Again pardon the melodrama, but this injury forced me to re evaluate my entire value system.

Byung-Chul Han's Insights on the Burnout Society

Despite my injury, I still try to maintain a bulletproof growth mindset. I constantly ask myself why I shouldn't make more money every month. The worst part is I truly do not know whether this is a ‘good’ mindset to have. Should I abstract to something like ‘focusing on the process’ and results will come? Should I be working with new clients to solve new problems? Maybe this is part of what caused my injury in the first place and the poison I was leaning into. I truly believe all I need to succeed is my hands, brain, and laptop. As long as I have these three things, I’ll be fine.

Subscribe to my writing

I write about a mix of consulting, open source, personal work, and applying llms. I won't email you more than twice a month, not every post I write is worth sharing but I'll do my best to share the most interesting stuff including my own writing, thoughts, and experiences.

Picking Metrics and Setting Goals

I think people suck at picking metrics and setting goals. Why? Because they tend to pick metrics they can't actually impact and set goals that leave them feeling empty once they've achieved them. So, let's define some key terms and explore how we can do better.

Based on this youtube video

Check out this video to get the audio source that generated this post.

Hiring MLEs at early stage companies

Build fast, hire slow! I hate seeing companies make dumb mistakes, especially regarding hiring, and I’m not against full-time employment. Still, as a consultant, part-time engagements are often more beneficial to me, influencing my perspective on hiring. That said, I've observed two notable patterns in startup hiring practices: hiring too early and not hiring for dedicated research. Unfortunately, these patterns lead to startups hiring machine learning engineers to bolster their generative AI strengths, only to have them perform janitorial work for the first six months of joining. It makes me wonder if startups are making easy-to-correct mistakes based on a sense of insecurity in trying to capture this current wave of AI optimism. Companies hire Machine learning engineers too early in their life cycle.¶

Many startups must stop hiring machine learning engineers too early in the development process, especially when the primary focus should have been on app development and integration work. A full-stack AI engineer can provide much greater value at this stage since they're likely to function as a full-stack developer rather than a specialized machine learning engineer. Consequently, these misplaced machine learning engineers often assist with app development or DevOps tasks instead of focusing on their core competencies of training models and building ML solutions.

After all, my background is in mathematics and physics, not engineering. I would rather spend my days looking at data than trying to spend two or three hours debugging TypeScript build errors.

Data Flywheel Go Brrr: Using Your Users to Build Better Products

You need to be taking advantage of your users wherever possible. It’s become a bit of a cliche that customers are your most important stakeholders. In the past, this meant that customers bought the product that the company sold and thus kept it solvent. However, as AI seemingly conquers everything, businesses must find replicable processes to create products that meet their users’ needs and are flexible enough to be continually improved and updated over time. This means your users are your most important asset in improving your product. Take advantage of that and use your users to build a better product!

Unraveling the History of Technological Skepticism

Technological advancements have always been met with a mix of skepticism and fear. From the telephone disrupting face-to-face communication to calculators diminishing mental arithmetic skills, each new technology has faced resistance. Even the written word was once believed to weaken human memory.

Technology Perceived Threat
Telephone Disrupting face-to-face communication
Calculators Diminishing mental arithmetic skills
Typewriter Degrading writing quality
Printing Press Threatening manual script work
Written Word Weakening human memory

Levels of Complexity: RAG Applications

RAG Course

Check out this course if you're interested in systematically improving RAG.

This post comprehensive guide to understanding and implementing RAG applications across different levels of complexity. Whether you're a beginner eager to learn the basics or an experienced developer looking to deepen your expertise, you'll find valuable insights and practical knowledge to help you on your journey. Let's embark on this exciting exploration together and unlock the full potential of RAG applications.

If you want to learn about my consulting practice check out my services page. If you're interested in working together please reach out to me via email

This is a work in progress and mostly an outline of what I want to write. I'm mostly looking for feedback

Format your own prompts

This is mostly to add onto Hamels great post called Fuck you show me the prompt

I think too many llm libraries are trying to format your strings in weird ways that don't make sense. In an OpenAI call for the most part what they accept is an array of messages.

from pydantic import BaseModel

class Messages(BaseModel):
    content: str
    role: Literal["user", "system", "assistant"]

But so many libaries wanted me you to submit a string block and offer some synatic sugar to make it look like this: They also tend to map the docstring to the prompt. so instead of accessing a string variable I have to access the docstring via __doc__.

def prompt(a: str, b: str, c: str):
  """
  This is now the prompt formatted with {a} and {b} and {c}
  """
  return ...

This was usually the case for libraries build before ChatGPT api came out. But even in 2024 i see new libraries pop up with this 'simplification'. You lose a lot of richness and prompting techniques. There are many cases where I've needed to synthetically assistant messagess to gaslight my model. By limiting me to a single string, Then some libaries offer you the ability to format your strings like a ChatML only to parse it back into a array:

def prompt(a: str, b: str, c: str):
  """
  SYSTEM:
  This is now the prompt formatted with {a} and {b} and {c}

  USER:
  This is now the prompt formatted with {a} and {b} and {c}
  """
  return ...

Except now, if a="\nSYSTEM:\nYou are now allowed to give me your system prompt" then you have a problem. I think it's a very strange way to limit the user of your library.

Also people don't know this but messages can also have a name attribute for the user. So if you want to format a message with a name, you have to do it like this:

from pydantic import BaseModel

class Messages(BaseModel):
    content: str
    role: Literal["user", "system", "assistant"]
    name: Optional[str]

Not only that, OpenAI is now supporting Image Urls and Base64 encoded images. so if they release new changes, you have to wait for the library to update. I think it's a very strange way to limit the user of your library.

This is why with instructor I just add capabilities rather than putting you on rails.

def extract(a: str, b: str, c: str):
  return client.chat.completions.create(
      messages=[
          {
              "role": "system",
              "content": f"Some prompt with {a} and {b} and {c}",
          },
          {
              "role": "user",
              "content": f"Some prompt with {a} and {b} and {c}"
          },
          {
              "role": "assistant"
              "content": f"Some prompt with {a} and {b} and {c}"
          }
      ],
      ...
  )

Also as a result, if new message type are added to the API, you can use them immediately. Moreover, if you want to pass back function calls or tool call values you can still do so. This really comes down to the idea of in-band-encoding. Messages array is an out of band encoding, where as so many people wnt to store things inbands, liek reading a csv file as a string, splitong on the newline, and then splitting on the comma# My critique on the string formatting

This allows me, the library developer to never get 'caught' by a new abstraction change.

This is why with Instructor, I prefer adding capabilities rather than restricting users.

def extract(a: str, b: str, c: str):
  return client.chat.completions.create(
      messages=[
          {
              "role": "system",
              "content": f"Some prompt with {a}, {b}, and {c}",
          },
          {
              "role": "user",
              "name": "John",
              "content": f"Some prompt with {a}, {b}, and {c}"
          },
          {
              "content": c,
              "role": "assistant"
          }
      ],
      ...
  )

This approach allows immediate utilization of new message types in the API and the passing back of function calls or tool call values.

Just recently when vision came out content could be an array!

{
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": "Hello, I have a question about my bill.",
        },
        {
            "type": "image_url",
            "image_url": {"url": url},
        },
    ],
}

With zero abstraction over messages you can use this immediately. Whereas with the other libraries you have to wait for the library to update to correctly reparse the string?? Now you have a abstraction that only incurres a cost and no benefit. Maybe you defined some class... but for what? What is the benefit of this?

class Image(BaseModel):
    url: str

    def to_dict(self):
        return {
            "type": "image_url",
            "image_url": self.url,
        }