LanguageModelDPOLossConfig¶
Module: fast_llm.layers.language_model.loss.config
Variant of: LanguageModelLossConfig — select with type: dpo
Inherits from: LanguageModelLossConfig
Fields¶
beta—core-
Type:
floatDefault:1.0Beta parameter for DPO loss (controls strength of preference optimization).
weight—core-
Type:
floatDefault:1.0Weight for this loss in the total loss computation.
reference_model—feature-
Type:
strDefault: (required)Name of the reference model to use for dpo.