SampledDatasetConfig¶
Abstract
This class cannot be instantiated directly. Use one of the variants listed below.
Module: fast_llm.data.dataset.config
Inherits from: DatasetConfig
No user-configurable fields.
Variants¶
Select a variant by setting type: to one of the following values.
type value |
Class | Description |
|---|---|---|
blended |
BlendedDatasetConfig | Mixes multiple datasets together, sampling from each according to specified weights |
concatenated |
ConcatenatedDatasetConfig | Concatenate multiple indexed datasets as if they were one |
file |
GPTDatasetFromFileConfig | |
fim |
GPTFimSampledDatasetConfig | Configuration for FIM |
memmap |
MemmapDatasetConfig | |
random |
GPTRandomDatasetConfig | |
slice |
DatasetSliceConfig | |
streaming |
StreamingDatasetConfig | Configuration for a streaming dataset that reads training data from a Redis stream |
test_slow |
GPTTestSlowDatasetConfig | A mock dataset that mimics a slow dataset creation on one rank, which may trigger a timeout |