MLPConfig¶

Module: fast_llm.layers.decoder.mlp.config

Variant of: MLPBaseConfig — select with type: mlp

Inherits from: MLPBaseConfig, BlockWithBiasConfig, BlockConfig

Fields¶

activation — architecture

Type: ActivationType Default: None

The MLP intermediate activation type. Default: SiLU for gated MLP, GeLU otherwise.

add_linear_biases — architecture

Type: bool Default: True

Add biases to linear layers. May be overridden for individual layers.

gated — architecture

Type: bool Default: False

Enable gated MLP.

intermediate_size — architecture

Type: int Default: 4096

Hidden dimension of the MLP intermediate state.

layer_1 — architecture

Type: AffineLinearConfig Default: (sub-fields optional)

Configuration for the first MLP layer.

layer_2 — architecture

Type: AffineLinearConfig Default: (sub-fields optional)

Configuration for the second MLP layer.

post_norm — architecture

Type: NormalizationConfig or None Default: None

Optional normalization applied to the MLP output.

pre_norm — architecture

Type: NormalizationConfig or None Default: None

Optional normalization applied to the MLP input.

lr_scale — feature

Type: float or None Default: None

Scaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.

recompute_level — performance

Type: MLPRecomputeLevel Default: "none"

Set which of the MLP intermediate activations will be recomputed during the backward passes. This provides a trade-off between memory and speed.