Skip to content

LanguageModelBatchPreprocessingConfig

Module: fast_llm.data.document.config

Inherits from: TokenPreprocessingConfig, LengthPreprocessingConfig, BatchPreprocessingConfig

Fields

causal
Type: bool    Default: True
distributed
Type: DistributedConfig    Default: (sub-fields optional)
micro_batch_splits
Type: int    Default: 1
phase
Type: PhaseType    Default: "training"
predicted_tokens
Type: int    Default: 1
return_cumulative_sequence_lengths
Type: bool    Default: False
return_document_count
Type: bool    Default: False
return_document_index
Type: bool    Default: False
return_label_counts
Type: bool    Default: False
return_max_sequence_lengths
Type: bool    Default: False
return_position_index
Type: bool    Default: False
return_prediction_mask
Type: bool    Default: False
use_grpo_data
Type: bool    Default: False
use_loss_masking_spans
Type: bool    Default: True
use_preference_spans
Type: bool    Default: False
vision_encoder
Type: PatchPreprocessingConfig or None    Default: None
vocab_size
Type: int or None    Default: None

Used in