Skip to content

Generating Structured Output / JSON from LLMs

Language models have seen significant growth. Using them effectively often requires complex frameworks. This post discusses how Instructor simplifies this process using Pydantic.

The Problem with Existing LLM Frameworks

Current frameworks for Language Learning Models (LLMs) have complex setups. Developers find it hard to control interactions with language models. Some frameworks require complex JSON Schema setups.

The OpenAI Function Calling Game-Changer

OpenAI's Function Calling feature provides a constrained interaction model. However, it has its own complexities, mostly around JSON Schema.

Why Pydantic?

Instructor uses Pydantic to simplify the interaction between the programmer and the language model.

  • Widespread Adoption: Pydantic is a popular tool among Python developers.
  • Simplicity: Pydantic allows model definition in Python.
  • Framework Compatibility: Many Python frameworks already use Pydantic.
import pydantic
import instructor
from openai import OpenAI

# Enables the response_model
client = instructor.patch(OpenAI())


class UserDetail(pydantic.BaseModel):
    name: str
    age: int

    def introduce(self):
        return f"Hello I'm {self.name} and I'm {self.age} years old"


user: UserDetail = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=UserDetail,
    messages=[
        {"role": "user", "content": "Extract Jason is 25 years old"},
    ],
)

Simplifying Validation Flow with Pydantic

Pydantic validators simplify features like re-asking or self-critique. This makes these tasks less complex compared to other frameworks.

from typing_extensions import Annotated
from pydantic import BaseModel, BeforeValidator
from instructor import llm_validator


class QuestionAnswerNoEvil(BaseModel):
    question: str
    answer: Annotated[
        str,
        BeforeValidator(llm_validator("don't say objectionable things")),
    ]

The Modular Approach

Pydantic allows for modular output schemas. This leads to more organized code.

Composition of Schemas

class UserDetails(BaseModel):
    name: str
    age: int


class UserWithAddress(UserDetails):
    address: str

Defining Relationships

class UserDetail(BaseModel):
    id: int
    age: int
    name: str
    friends: List[int]


class UserRelationships(BaseModel):
    users: List[UserDetail]

Using Enums

from enum import Enum, auto


class Role(Enum):
    PRINCIPAL = auto()
    TEACHER = auto()
    STUDENT = auto()
    OTHER = auto()


class UserDetail(BaseModel):
    age: int
    name: str
    role: Role

Flexible Schemas

from typing import List


class Property(BaseModel):
    key: str
    value: str


class UserDetail(BaseModel):
    age: int
    name: str
    properties: List[Property]

Chain of Thought

class TimeRange(BaseModel):
    chain_of_thought: str
    start_time: int
    end_time: int


class UserDetail(BaseModel):
    id: int
    age: int
    name: str
    work_time: TimeRange
    leisure_time: TimeRange

Language Models as Microservices

The architecture resembles FastAPI. Most code can be written as Python functions that use Pydantic objects. This eliminates the need for prompt chains.

FastAPI Stub

import fastapi
from pydantic import BaseModel

class UserDetails(BaseModel):
    name: str
    age: int

app = fastapi.FastAPI()

@app.get("/user/{user_id}", response_model=UserDetails)
async def get_user(user_id: int) -> UserDetails:
    return ...

Using Instructor as a Function

def extract_user(str) -> UserDetails:
    return client.chat.completions(
           response_model=UserDetails,
           messages=[]
    )

Response Modeling

class MaybeUser(BaseModel):
    result: Optional[UserDetail]
    error: bool
    message: Optional[str]

Conclusion

Instructor, with Pydantic, simplifies interaction with language models. It is usable for both experienced and new developers.

If you enjoy the content or want to try out instructor please check out the github and give us a star!