MLPBaseConfig¶

Abstract

This class cannot be instantiated directly. Use one of the variants listed below.

Module: fast_llm.layers.decoder.config

Inherits from: BlockWithBiasConfig, BlockConfig, ModuleConfig

Fields¶

post_norm — architecture

Type: NormalizationConfig or None Default: None

Optional normalization applied to the MLP output.

pre_norm — architecture

Type: NormalizationConfig or None Default: None

Optional normalization applied to the MLP input.

lr_scale — feature

Type: float or None Default: None

Scaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.

Select a variant by setting type: to one of the following values.

`type` value	Class	Description
`hybrid_moe`	HybridMoEMLPConfig	Configuration for a MoE layer combining an always-active dense MLP with top-K routed experts
`mlp`	MLPConfig	Configuration for a dense feedforward (MLP) layer with optional gating and activation recomputation
`moe`	MoEMLPConfig	Configuration for a Mixture-of-Experts (MoE) feedforward layer with top-k token routing