Skip to content

Example: Parsing a Directory Tree

In this example, we will demonstrate how define and use a recursive class definition to convert a string representing a directory tree into a filesystem structure using OpenAI's function call api. We will define the necessary structures using Pydantic, create a function to parse the tree, and provide an example of how to use it.

Defining the Structures

We will use Pydantic to define the necessary data structures representing the directory tree and its nodes. We have two classes, Node and DirectoryTree, which are used to model individual nodes and the entire directory tree, respectively.

Flat is better than nested

While it's easier to model things as nested, returning flat items with dependencies tends to yield better results. For a flat example, check out planning tasks where we model a query plan as a dag.

import enum
from typing import List
from pydantic import Field

class NodeType(str, enum.Enum):
    """Enumeration representing the types of nodes in a filesystem."""
    FILE = "file"
    FOLDER = "folder"

class Node(BaseModel):
    """
    Class representing a single node in a filesystem. Can be either a file or a folder.
    Note that a file cannot have children, but a folder can.

    Args:
        name (str): The name of the node.
        children (List[Node]): The list of child nodes (if any).
        node_type (NodeType): The type of the node, either a file or a folder.

    Methods:
        print_paths: Prints the path of the node and its children.
    """
    name: str = Field(..., description="Name of the folder")
    children: List["Node"] = Field(
        default_factory=list,
        description="List of children nodes, only applicable for folders, files cannot have children",
    )
    node_type: NodeType = Field(
        default=NodeType.FILE,
        description="Either a file or folder, use the name to determine which it could be",
    )

    def print_paths(self, parent_path=""):
        """Prints the path of the node and its children."""
        if self.node_type == NodeType.FOLDER:
            path = f"{parent_path}/{self.name}" if parent_path != "" else self.name
            print(path, self.node_type)
            if self.children is not None:
                for child in self.children:
                    child.print_paths(path)
        else:
            print(f"{parent_path}/{self.name}", self.node_type)

class DirectoryTree(BaseModel):
    """
    Container class representing a directory tree.

    Args:
        root (Node): The root node of the tree.

    Methods:
        print_paths: Prints the paths of the root node and its children.
    """
    root: Node = Field(..., description="Root folder of the directory tree")

    def print_paths(self):
        """Prints the paths of the root node and its children."""
        self.root.print_paths()

Node.update_forward_refs()
DirectoryTree.update_forward_refs()

The Node class represents a single node in the directory tree. It has a name, a list of children nodes (applicable only to folders), and a node type (either a file or a folder). The print_paths method can be used to print the path of the node and its children.

The DirectoryTree class represents the entire directory tree. It has a single attribute, root, which is the root node of the tree. The print_paths method can be used to print the paths of the root node and its children.

Parsing the Tree

We define a function parse_tree_to_filesystem to convert a string representing a directory tree into a filesystem structure using OpenAI.

import instructor
from openai import OpenAI

# Apply the patch to the OpenAI client
# enables response_model keyword
client = instructor.patch(OpenAI())

def parse_tree_to_filesystem(data: str) -> DirectoryTree:
    """
    Convert a string representing a directory tree into a filesystem structure
    using OpenAI's GPT-3 model.

    Args:
        data (str): The string to convert into a filesystem.

    Returns:
        DirectoryTree: The directory tree representing the filesystem.
    """

    return client.chat.completions.create(
        model="gpt-3.5-turbo-0613",
        response_model=DirectoryTree,
        messages=[
            {
                "role": "system",
                "content": "You are a perfect file system parsing algorithm. You are given a string representing a directory tree. You must return the correct filesystem structure.",
            },
            {
                "role": "user",
                "content": f"Consider the data below:\n{data} and return the correctly labeled filesystem",
            },
        ],
        max_tokens=1000,
    )

The parse_tree_to_filesystem function takes a string data representing the directory tree and returns a DirectoryTree object representing the filesystem structure. It uses the OpenAI Chat API to complete the prompt and extract the directory tree.

Example Usage

Let's demonstrate how to use the parse_tree_to_filesystem

function with an example:

root = parse_tree_to_filesystem(
    """
    root
    ├── folder1
    │   ├── file1.txt
    │   └── file2.txt
    └── folder2
        ├── file3.txt
        └── subfolder1
            └── file4.txt
    """
)
root.print_paths()

In this example, we call parse_tree_to_filesystem with a string representing a directory tree.

After parsing the string into a DirectoryTree object, we call root.print_paths() to print the paths of the root node and its children. The output of this example will be:

root                               NodeType.FOLDER
root/folder1                       NodeType.FOLDER
root/folder1/file1.txt             NodeType.FILE
root/folder1/file2.txt             NodeType.FILE
root/folder2                       NodeType.FOLDER
root/folder2/file3.txt             NodeType.FILE
root/folder2/subfolder1            NodeType.FOLDER
root/folder2/subfolder1/file4.txt  NodeType.FILE

This demonstrates how to use OpenAI's GPT-3 model to parse a string representing a directory tree and obtain the correct filesystem structure.

I hope this example helps you understand how to leverage OpenAI Function Call for parsing recursive trees. If you have any further questions, feel free to ask!