LanguageModelHeadConfig¶
Module: fast_llm.layers.language_model.config
Inherits from: BlockConfig, ModuleConfig
Fields¶
losses—core-
Type: dict[
str, LanguageModelLossConfig] Default:dict()A dictionary of loss names and their configurations. If not specified, a cross-entropy loss with respect to the targets will be used.
normalization—architecture-
Type: NormalizationConfig Default: (sub-fields optional)
Configuration for the final normalization layer.
output_weight—architecture-
Type: ParameterConfig Default: (sub-fields optional)
Configuration for the LM output layer (weight). Ignored for tied embeddings
prediction_heads—architecture-
Type:
intDefault:1Prediction heads.
cross_entropy_splits—feature-
Type:
intDefault:1Split the logit and cross-entropy computation into this many fragment, to reduce memory usage.
logits_scale_factor—feature-
Type:
floatDefault:1.0Multiply output logits by scale factor. Useful in muP setting, since we need to adjust the output logits by the width factor. Since we are mupltiplying the output logits, under muP the scale factor should be < 1.0.
lr_scale—feature-
Type:
floatorNoneDefault:NoneScaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.
prediction_loss_coefficient—feature-
Type: list[
float] orNoneDefault:NoneLoss coefficient for each prediction head. If not provided, all heads are equally weighted.