Skip to content

SampledDatasetConfig

Abstract

This class cannot be instantiated directly. Use one of the variants listed below.

Module: fast_llm.data.dataset.config

Inherits from: DatasetConfig

No user-configurable fields.

Variants

Select a variant by setting type: to one of the following values.

type value Class Description
blended BlendedDatasetConfig Mixes multiple datasets together, sampling from each according to specified weights
concatenated ConcatenatedDatasetConfig Concatenate multiple indexed datasets as if they were one
file GPTDatasetFromFileConfig
fim GPTFimSampledDatasetConfig Configuration for FIM
memmap MemmapDatasetConfig
random GPTRandomDatasetConfig
slice DatasetSliceConfig
streaming StreamingDatasetConfig Configuration for a streaming dataset that reads training data from a Redis stream
test_slow GPTTestSlowDatasetConfig A mock dataset that mimics a slow dataset creation on one rank, which may trigger a timeout