Skip to content

StochasticMixerConfig

Module: fast_llm.layers.decoder.config

Variant of: MixerConfig — select with type: stochastic

Inherits from: MixerConfig, BlockWithBiasConfig, BlockConfig

Fields

mixersarchitecture

Type: dict[str, MixerConfig] or None    Default: None

Dict of mixer options to sample from (must contain at least 1). Keys are mixer names used for debugging and namespacing.

lr_scalefeature

Type: float or None    Default: None

Scaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.

main_mixer_namefeature

Type: str or None    Default: None

Name of the main mixer. Used for inference/eval, checkpoint loading (receives pretrained weights), and checkpoint saving (only this mixer is exported). If None, uses the first mixer in the dict.

sampling_strategyfeature

Type: StochasticMixerSamplingStrategy    Default: "uniform"

Strategy for sampling mixers during training.

sampling_weightsfeature

Type: dict[str, float] or None    Default: None

Sampling probability for each mixer by name (will be normalized to sum to 1.0). Only used when sampling_strategy='weighted'. If None with uniform strategy, all mixers have equal probability.

seed_shiftoptional

Type: int    Default: 501_974_169_931_277_706_872_159_392_843

Seed shift for mixer sampling reproducibility.