Skip to content

File System Handler

Manages local file operations with support for:

  • JSON, JSONL (JSON Lines), Parquet files
  • Special data type handling (datetime, numpy arrays)

Working with Local Files

  1. Reading from JSON/Parquet/JSONL files:

YAML:

data_config:
  source:
    type: "disk"
    file_path: "data/input.json"   # also supports Parquet, JSONL

Python:

from data_handlers import FileHandler
from sygra.core.dataset.dataset_config import DataSourceConfig

config = DataSourceConfig(file_path="/data/input.parquet")
handler = FileHandler(source_config=config)
data = handler.read()

  1. Writing to JSONL with custom encoding:

YAML:

data_config:
  sink:
    type: "disk"
    file_path: "data/output.jsonl"
    encoding: "utf-16"

Python:

from sygra.core.dataset.dataset_config import OutputConfig
from data_handlers import FileHandler

output_config = OutputConfig(encoding="utf-16")
handler = FileHandler(output_config=output_config)
handler.write(data, path="/data/output.jsonl")