LanguageModelDistillationLossConfig¶

Module: fast_llm.layers.language_model.loss.config

Variant of: LanguageModelLossConfig — select with type: distillation

Inherits from: LanguageModelLossConfig

Fields¶

loss_type — core

Type: EntropyLossType Default: "cross_entropy"

Type of loss to use.

weight — core

Type: float Default: 1.0

Weight for this loss in the total loss computation.

logits_scale_factor — feature

Type: float Default: 1.0

Extra logits scale factor applied for this loss only, stacked on top of the model's logits_scale_factor.

reference_model — feature

Type: str Default: "teacher"

Name of the reference model for knowledge distillation.

temperature — optional

Type: float Default: 1.0

Temperature for teacher softmax.

use_triton — expert

Type: bool or None Default: None

Enable triton implementation. Default: use if available.