SamplingConfigBase¶
Module: fast_llm.data.dataset.config
Fields¶
maximum_document_length—core-
Type:
intorNoneDefault:NoneMaximum number of tokens in a document. Document exceeding this size will be truncated or dropped depending on
truncate_documents. micro_batch_size—core-
Type:
intDefault:2048Size of individual micro-batches.
gpu—feature-
Type:
boolDefault:TrueEnable fast sampling on GPU. Note that random sampling works differently on GPU, so the sample won't match the CPU equivalent.
shuffle—feature-
Type:
ShufflingTypeDefault:"epoch"Shuffling strategy.
truncate_documents—feature-
Type:
boolorNoneDefault:TrueIf enabled, documents may be truncated while being packed to fit the sequence length.Otherwise, sequences will be padded such that every document lies entirely within a sample (and documents exceeding the sequence length will be skipped altogether).