Structured Output Generation¶
π Introduction¶
This module introduces a flexible and extensible framework to generate and validate structured outputs from LLMs. It is designed to work natively with Azure OpenAI, VLLM and Ollama, leveraging their structured output capabilities, while providing fallback support for any other LLM or custom model wrapper through JSON schema-based validation.
π§ Why This is Helpful¶
- Ensures consistent and reliable response formats from LLMs during data generation to reduce the post processing effort.
- Eliminates the need for brittle regex or manual parsing of raw LLM outputs.
- Allows easy plug-and-play validation via Pydantic classes or YAML-based schema configs.
π Key Features¶
- Structured Output Modes:
- Class-based schema: Define a Python class for validating output (e.g., using Pydantic).
- YAML-defined schema: Use simple YAML to define structure and field-level validation rules.
- Dynamic Type-Fetching:
- User simply needs to point their schema correctly to either of the above two choices. The type of schema desired and required processing for it is handled dynamically.
- Flexible Integration:
- Plug structured output validation into any LLM pipeline with minimal code changes.
- Easily extend by defining your own schemas or rules.
βοΈ Architecture Overview¶
Each node in a task graph can be defined with:
- Model name (e.g., gpt-4o
, llama3-8b
)
- Parameters (e.g., temperature
)
- Structured output config:
- Class-based (type: class
)
- Schema-based (type: schema
)
nodes:
node1:
node_type: llm
model:
name: gpt-4o
parameters:
temperature: 0.1
structured_output:
enabled: true
schema: "grasp.core.models.structured_output.schemas_factory.CustomUserSchema"
node2:
node_type: llm
model:
name: llama3-8b
parameters:
temperature: 0.2
structured_output:
enabled: true
schema:
fields:
answer:
type: str
description: "The main answer"
confidence:
type: float
description: "Confidence score"
βοΈ How to Define a Schema¶
Structured output schema can be defined in two ways depending on your use case:
β Option 1: Using Python Class (Pydantic)¶
This is the preferred approach if you want to write custom validation logic or reuse complex types.
- Define your schema class in
<your_local_path>
:
from pydantic import BaseModel, root_validator, model_validator, Field
from typing import Any
class SimpleResponse(BaseModel):
"""Simple response with just text and status"""
message: str = Field(description="Response message")
success: bool = Field(default=True, description="Operation success status")
@model_validator(pre=True)
def check_non_empty_messages(cls, values):
if not values.get('message'):
raise ValueError('message cannot be empty')
return values
- Then point to it in your YAML config:
structured_output:
enabled: true
schema: "<your_local_path>.SimpleResponse"
β Option 2: Using YAML (No Code Schema Definition)¶
This is ideal for quick prototyping or use cases where no custom Python logic is needed.
structured_output:
enabled: true
schema:
fields:
answer:
type: str
description: "The main answer"
confidence:
type: float
description: "Confidence score"
π Rules for Using Schema Validation¶
To ensure the structured output validation runs smoothly, follow these rules:
πΉ General Rules¶
1. Structured Output Generation is triggered only when structured_output
is defined:
- schema
key must be present and must point to a valid pydantic class path or contain a dict defining desired output schema. Additional rules for class based schema definition or YAML based schema definition are described below.
2. enabled
is optional:
- If the structured_output
key is present in the nodeβs YAML config, enabled
is set to true by default.
- Use this when you want to turn off structured_output_generation
but want to preserve schema config make in YAML
.
πΉ For Class-Based Schemas¶
- Must inherit from
pydantic.BaseModel
. - Can use
@model_validator
or@validator
for custom rules. - The class path provided must be fully qualified, e.g.
<your_local_path>>.CustomUserSchema
.
πΉ For YAML-Based Schemas¶
- Must be defined inside
fields
key insideschema
key. - Type for each field must be valid Python type.
π Limitations¶
- Native structured output generation is currently supported exclusively for
vllm
,azure_openai
,tgi
andollama
ModelTypes. - For OpenAI ModelType, structured output functionality requires API versions
2024-08-01-preview
or later. - For VLLM ModelType, structured output functionality requires version
0.8.4
or later. - For all other ModelTypes, structured output generation is supported through fallback JSON schema validation.