Skip to content

General Tips for Prompt Engineering

The overarching theme of using Instructor and Pydantic for function calling is to make the models as self-descriptive, modular, and flexible as possible, while maintaining data integrity and ease of use.

  • Modularity: Design self-contained components for reuse.
  • Self-Description: Use Pydantic's Field for clear field descriptions.
  • Optionality: Use Python's Optional type for nullable fields and set sensible defaults.
  • Standardization: Employ enumerations for fields with a fixed set of values; include a fallback option.
  • Dynamic Data: Use key-value pairs for arbitrary properties and limit list lengths.
  • Entity Relationships: Define explicit identifiers and relationship fields.
  • Contextual Logic: Optionally add a "chain of thought" field in reusable components for extra context.

Modular Chain of Thought

This approach to "chain of thought" improves data quality but can have modular components rather than global CoT.

from pydantic import BaseModel, Field

class Role(BaseModel):
    chain_of_thought: str = Field(...,
        description="Think step by step to determine the correct title")
    title: str

class UserDetail(BaseModel):
    age: int
    name: str
    role: Role

Utilize Optional Attributes

Use Python's Optional type and set a default value to prevent undesired defaults like empty strings.

from typing import Optional

class UserDetail(BaseModel):
    age: int
    name: str
    role: Optional[str] = Field(default=None)

Handling Errors Within Function Calls

You can create a wrapper class to hold either the result of an operation or an error message. This allows you to remain within a function call even if an error occurs, facilitating better error handling without breaking the code flow.

class UserDetail(BaseModel):
    age: int
    name: str
    role: Optional[str] = Field(default=None)

class MaybeUser(BaseModel):
    result: Optional[UserDetail] = Field(default=None)
    error: bool = Field(default=False)
    message: Optional[str]

    def __bool__(self):
        return self.result is not None

With the MaybeUser class, you can either receive a UserDetail object in result or get an error message in message.

Simplification with the Maybe Pattern

You can further simplify this using instructor to create the Maybe pattern dynamically from any BaseModel.

import instructor

MaybeUser = instructor.Maybe(UserDetail)

This allows you to quickly create a Maybe type for any class, streamlining the process.

Tips for Enumerations

To prevent data misalignment, use Enums for standardized fields. Always include an "Other" option as a fallback so the model can signal uncertainty.

from enum import Enum, auto

class Role(Enum):
    PRINCIPAL = auto()
    TEACHER = auto()
    STUDENT = auto()
    OTHER = auto()

class UserDetail(BaseModel):
    age: int
    name: str
    role: Role = Field(description="Correctly assign one of the predefined roles to the user.")

If you're having a hard time with Enum and alternative is to use Literal

class UserDetail(BaseModel):
    age: int
    name: str
    role: Literal["PRINCIPAL", "TEACHER", "STUDENT", "OTHER"]

If you'd like to improve performance more you can reiterate the requirements in the field descriptions or in the docstrings.

Reiterate Long Instructions

For complex attributes, it helps to reiterate the instructions in the field's description.

class Role(BaseModel):
    """
    Extract the role based on the following rules ...
    """
    instructions: str = Field(..., description="Restate the instructions and rules to correctly determine the title.")
    title: str

class UserDetail(BaseModel):
    age: int
    name: str
    role: Role

Handle Arbitrary Properties

When you need to extract undefined attributes, use a list of key-value pairs.

from typing import List

class Property(BaseModel):
    key: str
    value: str

class UserDetail(BaseModel):
    age: int
    name: str
    properties: List[Property] = Field(..., description="Extract any other properties that might be relevant.")

Limiting the Length of Lists

When dealing with lists of attributes, especially arbitrary properties, it's crucial to manage the length. You can use prompting and enumeration to limit the list length, ensuring a manageable set of properties.

class Property(BaseModel):
    index: str = Field(..., description="Monotonically increasing ID")
    key: str
    value: str

class UserDetail(BaseModel):
    age: int
    name: str
    properties: List[Property] = Field(..., description="Numbered list of arbitrary extracted properties, should be less than 6")

Using Tuples for Simple Types

For simple types, tuples can be a more compact alternative to custom classes, especially when the properties don't require additional descriptions.

class UserDetail(BaseModel):
    age: int
    name: str
    properties: List[Tuple[int, str]] = Field(..., description="Numbered list of arbitrary extracted properties, should be less than 6")

Advanced Arbitrary Properties

For multiple users, aim to use consistent key names when extracting properties.

class UserDetails(BaseModel):
    """
    Extract information for multiple users.
    Use consistent key names for properties across users.
    """
    users: List[UserDetail]

This refined guide should offer a cleaner and more organized approach to structure engineering in Python.

Defining Relationships Between Entities

In cases where relationships exist between entities, it's vital to define them explicitly in the model. The following example demonstrates how to define relationships between users by incorporating an id and a friends field:

class UserDetail(BaseModel):
    id: int = Field(..., description="Unique identifier for each user.")
    age: int
    name: str
    friends: List[int] = Field(..., description="Correct and complete list of friend IDs, representing relationships between users.")

class UserRelationships(BaseModel):
    users: List[UserDetail] = Field(..., description="Collection of users, correctly capturing the relationships among them.")

Reusing Components with Different Contexts

You can reuse the same component for different contexts within a model. In this example, the TimeRange component is used for both work_time and leisure_time.

class TimeRange(BaseModel):
    start_time: int = Field(..., description="The start time in hours.")
    end_time: int = Field(..., description="The end time in hours.")

class UserDetail(BaseModel):
    id: int = Field(..., description="Unique identifier for each user.")
    age: int
    name: str
    work_time: TimeRange = Field(..., description="Time range during which the user is working.")
    leisure_time: TimeRange = Field(..., description="Time range reserved for leisure activities.")

Sometimes, a component like TimeRange may require some context or additional logic to be used effectively. Employing a "chain of thought" field within the component can help in understanding or optimizing the time range allocations.

class TimeRange(BaseModel):
    chain_of_thought: str = Field(..., description="Step by step reasoning to get the correct time range")
    start_time: int = Field(..., description="The start time in hours.")
    end_time: int = Field(..., description="The end time in hours.")