Skip to content

LanguageModelHeadConfig

Module: fast_llm.layers.language_model.config

Inherits from: BlockConfig, ModuleConfig

Fields

lossescore

Type: dict[str, LanguageModelLossConfig]    Default: dict()

A dictionary of loss names and their configurations. If not specified, a cross-entropy loss with respect to the targets will be used.

normalizationarchitecture

Type: NormalizationConfig    Default: (sub-fields optional)

Configuration for the final normalization layer.

output_weightarchitecture

Type: ParameterConfig    Default: (sub-fields optional)

Configuration for the LM output layer (weight). Ignored for tied embeddings

prediction_headsarchitecture

Type: int    Default: 1

Prediction heads.

cross_entropy_splitsfeature

Type: int    Default: 1

Split the logit and cross-entropy computation into this many fragment, to reduce memory usage.

logits_scale_factorfeature

Type: float    Default: 1.0

Multiply output logits by scale factor. Useful in muP setting, since we need to adjust the output logits by the width factor. Since we are mupltiplying the output logits, under muP the scale factor should be < 1.0.

lr_scalefeature

Type: float or None    Default: None

Scaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.

prediction_loss_coefficientfeature

Type: list[float] or None    Default: None

Loss coefficient for each prediction head. If not provided, all heads are equally weighted.

Used in