Skip to content

StochasticMixerConfig

Module: fast_llm.layers.decoder.config

Variant of: MixerConfig — select with type: stochastic

Inherits from: MixerConfig, BlockWithBiasConfig, BlockConfig

Fields

main_mixer_namearchitecture

Type: str or None    Default: None

Name of the main mixer. Used for inference/eval, checkpoint loading (receives pretrained weights), and checkpoint saving (only this mixer is exported). If None, uses the first mixer in the dict.

mixersarchitecture

Type: dict[str, MixerConfig] or None    Default: None

Dict of mixer options to sample from (must contain at least 1). Keys are mixer names used for debugging and namespacing.

lr_scalefeature

Type: float or None    Default: None

Scaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.

predefined_layout_probabilitiesfeature

Type: list[float]    Default: list()

Per-layout sampling probability, parallel to predefined_layouts. Each value must be in [0, 1]; the sum must be <= 1. The residual probability (1 - sum) falls through to sampling_strategy.

predefined_layoutsfeature

Type: list[list[str]]    Default: list()

List of predefined layouts to oversample. Each layout is a list of mixer names, one per layer. Mixer names must match keys in the mixers dict. All layouts must share a common length.

sampling_strategyfeature

Type: StochasticMixerSamplingStrategy    Default: "uniform"

Strategy for sampling mixers during training.

sampling_weightsfeature

Type: dict[str, float] or None    Default: None

Sampling probability for each mixer by name (will be normalized to sum to 1.0). Only used when sampling_strategy='weighted'. If None with uniform strategy, all mixers have equal probability.

seed_shiftoptional

Type: int    Default: 501_974_169_931_277_706_872_159_392_843

Seed shift for mixer sampling reproducibility.