StochasticMixerConfig¶
Module: fast_llm.layers.decoder.config
Variant of: MixerConfig — select with type: stochastic
Inherits from: MixerConfig, BlockWithBiasConfig, BlockConfig
Fields¶
mixers—architecture-
Type: dict[
str, MixerConfig] orNoneDefault:NoneDict of mixer options to sample from (must contain at least 1). Keys are mixer names used for debugging and namespacing.
lr_scale—feature-
Type:
floatorNoneDefault:NoneScaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.
main_mixer_name—feature-
Type:
strorNoneDefault:NoneName of the main mixer. Used for inference/eval, checkpoint loading (receives pretrained weights), and checkpoint saving (only this mixer is exported). If None, uses the first mixer in the dict.
sampling_strategy—feature-
Type:
StochasticMixerSamplingStrategyDefault:"uniform"Strategy for sampling mixers during training.
sampling_weights—feature-
Type: dict[
str,float] orNoneDefault:NoneSampling probability for each mixer by name (will be normalized to sum to 1.0). Only used when sampling_strategy='weighted'. If None with uniform strategy, all mixers have equal probability.
seed_shift—optional-
Type:
intDefault:501_974_169_931_277_706_872_159_392_843Seed shift for mixer sampling reproducibility.