HybridMoEMLPConfig¶

Module: fast_llm.layers.decoder.mlp.config

Variant of: MLPBaseConfig — select with type: hybrid_moe

Inherits from: MLPBaseConfig, BlockWithBiasConfig, BlockConfig

Fields¶

dense — architecture

Type: MLPConfig Default: (sub-fields optional)

Configuration for the always-active dense MLP.

post_norm — architecture

Type: NormalizationConfig or None Default: None

Optional normalization applied to the MLP output.

pre_norm — architecture

Type: NormalizationConfig or None Default: None

Optional normalization applied to the MLP input.

routed — architecture

Type: MoEMLPConfig Default: (sub-fields optional)

Configuration for the top-K routed expert MLP.

lr_scale — feature

Type: float or None Default: None

Scaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.