Create a Synthetic Datagen Pipeline

Steps to Create a Synthetic Data Pipeline¶

With the graph node and edge YAML configuration, it's easy to set up a flow.
Example: glaive code assistant.

Basic steps: - Create a sub-directory under tasks for your use case. - Create a graph_config.yaml for your pipeline (nodes, edges, models, etc). - Create a task_executor.py for any custom logic or processing. - Execute with python main.py --task <your_task> ... - Results are stored in output.json in your sub-directory.

Resumable Execution:¶

In the event of a failure, the process can gracefully shut down and later resume execution from the point of interruption. To activate resumable execution, set the flag --resume True when running your command. For instance: python main.py --task <your_task> ... --resume True.

See the Graph Configuration Guide for detailed schema, examples, and best practices for defining graphs, tasks, and processors.