LanguageModelHeadConfig¶

Module: fast_llm.layers.language_model.config

Inherits from: BlockConfig, ModuleConfig

Fields¶

final_logit_softcap — architecture

Type: float or None Default: None

Soft-cap applied to logits before loss: logits = tanh(logits / cap) * cap.

losses — core

Type: dict[str, LanguageModelLossConfig] Default: dict()

A dictionary of loss names and their configurations. If not specified, a cross-entropy loss with respect to the targets will be used.

normalization — architecture

Type: NormalizationConfig Default: (sub-fields optional)

Configuration for the final normalization layer.

output_weight — architecture

Type: ParameterConfig Default: (sub-fields optional)

Configuration for the LM output layer (weight). Ignored for tied embeddings

prediction_heads — architecture

Type: int Default: 1

Prediction heads.

cross_entropy_splits — feature

Type: int Default: 1

Split the logit and cross-entropy computation into this many fragment, to reduce memory usage.

logits_scale_factor — feature

Type: float Default: 1.0

Multiply output logits by scale factor. Useful in muP setting, since we need to adjust the output logits by the width factor. Since we are mupltiplying the output logits, under muP the scale factor should be < 1.0.

lr_scale — feature

Type: float or None Default: None

Scaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.

prediction_loss_coefficient — feature

Type: list[float] or None Default: None

Loss coefficient for each prediction head. If not provided, all heads are equally weighted.

LanguageModelHeadConfig¶

Fields¶

Used in¶