Instructor Classify¶
A fluent, type-safe API for text classification built on top of Instructor with support for multiple LLM providers.
Features¶
- ๐ท๏ธ Simple label definitions in YAML or Python
- ๐ Type-safe classification using Pydantic
- ๐ Multi-provider support (OpenAI, Anthropic, Google, etc.)
- ๐ Complete evaluation framework with metrics and visualizations
- โก Async/Sync APIs for flexibility in different environments
- ๐ฐ Cost & latency tracking for optimizing LLM usage
- ๐งช Statistical analysis with bootstrap confidence intervals
- ๐ CLI tools for project setup and evaluation
Overview¶
Instructor Classify provides a comprehensive framework for implementing, testing, and evaluating LLM-based text classification systems. It builds on the Instructor library to ensure type safety and structured outputs, while adding specific functionality for classification tasks.
The package supports both single-label and multi-label classification, with flexible APIs for synchronous and asynchronous operation, batch processing, and detailed performance evaluation.
Installation¶
# Install the package
pip install instructor-classify
# Install your preferred LLM provider
pip install openai # Or anthropic, google-generativeai, etc.
Basic Usage¶
Defining Classifications in Python¶
Create classification schemas directly in code:
from instructor_classify import Classifier, ClassificationDefinition, LabelDefinition, Examples
import instructor
from openai import OpenAI
# Create label definitions
question_label = LabelDefinition(
label="question",
description="User is asking for information or clarification",
examples=Examples(
examples_positive=[
"What is machine learning?",
"How do I reset my password?"
],
examples_negative=[
"Please book me a flight",
"I'd like to cancel my subscription"
]
)
)
request_label = LabelDefinition(
label="request",
description="User wants the assistant to perform an action",
examples=Examples(
examples_positive=[
"Please book me a flight to Paris",
"Schedule a meeting for tomorrow at 2pm"
],
examples_negative=[
"What is the capital of France?",
"Can you explain quantum physics?"
]
)
)
# Create classification definition
definition = ClassificationDefinition(
label_definitions=[question_label, request_label]
)
# Create classifier
client = instructor.from_openai(OpenAI())
classifier = (
Classifier(definition)
.with_client(client)
.with_model("gpt-3.5-turbo")
)
# Make a prediction
result = classifier.predict("What is machine learning?")
print(result.label) # -> "question"
Defining Classifications in YAML¶
For better version control and reusability, you can define schemas in YAML:
# prompt.yaml
label_definitions:
- label: question
description: User is asking for information or clarification
examples:
examples_positive:
- "What is machine learning?"
- "How do I reset my password?"
examples_negative:
- "Please book me a flight"
- "I'd like to cancel my subscription"
- label: request
description: User wants the assistant to perform an action
examples:
examples_positive:
- "Please book me a flight to Paris"
- "Schedule a meeting for tomorrow at 2pm"
examples_negative:
- "What is the capital of France?"
- "Can you explain quantum physics?"
Then load it in your code:
from instructor_classify.classify import Classifier
from instructor_classify.schema import ClassificationDefinition
import instructor
from openai import OpenAI
# Load classification definition from YAML
definition = ClassificationDefinition.from_yaml("prompt.yaml")
# Create classifier
client = instructor.from_openai(OpenAI())
classifier = (
Classifier(definition)
.with_client(client)
.with_model("gpt-3.5-turbo")
)
# Make a prediction
result = classifier.predict("What is machine learning?")
print(result.label) # -> "question"
Using the CLI¶
For quick project setup:
# Set up your API key
export OPENAI_API_KEY=your-api-key-here
# Initialize a new classification project
instruct-classify init my_classifier
cd my_classifier
This creates a project with:
prompt.yaml
: Sample classification schema definitionexample.py
: Sample code for using the classifierconfigs/
: Sample configurationsdatasets/
: Sample evaluation datasets
This should be enough for you to get started testing models