DecoderBlockConfig¶
Module: fast_llm.layers.decoder.config
Variant of: BlockConfig — select with type: decoder
Inherits from: BlockConfig, ModuleConfig
Fields¶
mixer—architecture-
Type: MixerConfig Default: (sub-fields optional)
Configuration for the attention/mixer layer.
mlp—architecture-
Type: MLPBaseConfig Default: (sub-fields optional)
Configuration for the feedforward (MLP) layer.
normalization—architecture-
Type: NormalizationConfig Default: (sub-fields optional)
Configuration for the block normalization layers.
distillation_loss_weight—feature-
Type:
floatDefault:1.0Weight for the scale the activation distillation loss.
distillation_model—feature-
Type:
strorNoneDefault:NoneName of the reference model to use for activation-level distillation.
dropout—feature-
Type:
floatDefault:0.0Dropout applied to the residual connections.
lr_scale—feature-
Type:
floatorNoneDefault:NoneScaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.