StochasticMixerConfig¶

Module: fast_llm.layers.decoder.config

Variant of: MixerConfig — select with type: stochastic

Inherits from: MixerConfig, BlockWithBiasConfig, BlockConfig

Fields¶

main_mixer_name — architecture

Type: str or None Default: None

Name of the main mixer. Used for inference/eval, checkpoint loading (receives pretrained weights), and checkpoint saving (only this mixer is exported). If None, uses the first mixer in the dict.

mixers — architecture

Type: dict[str, MixerConfig] or None Default: None

Dict of mixer options to sample from (must contain at least 1). Keys are mixer names used for debugging and namespacing.

lr_scale — feature

Type: float or None Default: None

Scaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.

predefined_layout_probabilities — feature

Type: list[float] Default: list()

Per-layout sampling probability, parallel to predefined_layouts. Each value must be in [0, 1]; the sum must be <= 1. The residual probability (1 - sum) falls through to sampling_strategy.

predefined_layouts — feature

Type: list[list[str]] Default: list()

List of predefined layouts to oversample. Each layout is a list of mixer names, one per layer. Mixer names must match keys in the mixers dict. All layouts must share a common length.

sampling_strategy — feature

Type: StochasticMixerSamplingStrategy Default: "uniform"

Strategy for sampling mixers during training.

sampling_weights — feature

Type: dict[str, float] or None Default: None

Sampling probability for each mixer by name (will be normalized to sum to 1.0). Only used when sampling_strategy='weighted'. If None with uniform strategy, all mixers have equal probability.

seed_shift — optional

Type: int Default: 501_974_169_931_277_706_872_159_392_843

Seed shift for mixer sampling reproducibility.