Structured Output¶

This tutorial shows how to implement structured output from LLM responses using the GraSP framework. You’ll learn to extract specific information from LLM outputs in a standardized JSON format.

Key Features You’ll Learn
structured JSON output, schema definition, post-processing, code taxonomy, response normalization

Prerequisites¶

GraSP framework installed (see Installation Guide)
Familiarity with Python and JSON

What You’ll Build¶

You’ll create a system that: - Loads code snippets from a dataset - Extracts structured data (category, subcategory) from LLM responses - Handles parsing and formatting for downstream use

Step 1: Project Structure¶

structured_output/
├── task_executor.py     # Structured data extraction
├── graph_config.yaml    # Workflow graph and schema

Step 2: Pipeline Implementation¶

Parent Graph (`graph_config.yaml`)¶

The main pipeline is defined in structured_output/graph_config.yaml:

Data Source: Loads code snippets from the glaiveai/glaive-code-assistant-v2 HuggingFace dataset, renaming task_id to id.
Nodes:
generate_taxonomy: An LLM node with a prompt instructing the model to extract the category and subcategory from the code snippet. The node uses a structured output schema for reliable extraction and a post-processor for normalization.
Edges: The workflow is linear: data → taxonomy extraction → END.
Output Config: Maps the question, category, and subcategory from the state to the final output structure.

Reference: structured_output/graph_config.yaml

Task Executor (`task_executor.py`)¶

This file implements custom logic for the pipeline: - GenerateTaxonomyPostProcessor: Extracts structured data from the LLM output, handling JSON parsing and normalization.

Reference: task_executor.py

Step 3: Output Formatting¶

Formats the final output with:
Original question ID and content
Extracted category and subcategory

Step 4: Running the Pipeline¶

From your GraSP project root, run:

python main.py --task path/to/your/structured_output

Example Output¶

[
    {
        "id": "20705cc57af2ec0d2ea976de82a4c833f915d6a0bd6b3e3b508c3a4edf213743",
        "question": "I have a field that contains a string of numbers like '2002 2005 2001 2006 2008 2344'...",
        "category": "Database",
        "sub_category": "SQL Query"
    }
]

Try It Yourself¶

Change the schema to extract different fields
Use your own dataset for testing

Next Steps¶

Explore structured output with multi-LLM for advanced scenarios
Learn about agent simulation for multi-agent conversations