StochasticMixerConfig¶
Module: fast_llm.layers.decoder.config
Variant of: MixerConfig — select with type: stochastic
Inherits from: MixerConfig, BlockWithBiasConfig, BlockConfig
Fields¶
main_mixer_name—architecture-
Type:
strorNoneDefault:NoneName of the main mixer. Used for inference/eval, checkpoint loading (receives pretrained weights), and checkpoint saving (only this mixer is exported). If None, uses the first mixer in the dict.
mixers—architecture-
Type: dict[
str, MixerConfig] orNoneDefault:NoneDict of mixer options to sample from (must contain at least 1). Keys are mixer names used for debugging and namespacing.
lr_scale—feature-
Type:
floatorNoneDefault:NoneScaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.
predefined_layout_probabilities—feature-
Type: list[
float] Default:list()Per-layout sampling probability, parallel to predefined_layouts. Each value must be in [0, 1]; the sum must be <= 1. The residual probability (1 - sum) falls through to sampling_strategy.
predefined_layouts—feature-
Type: list[list[
str]] Default:list()List of predefined layouts to oversample. Each layout is a list of mixer names, one per layer. Mixer names must match keys in the mixers dict. All layouts must share a common length.
sampling_strategy—feature-
Type:
StochasticMixerSamplingStrategyDefault:"uniform"Strategy for sampling mixers during training.
sampling_weights—feature-
Type: dict[
str,float] orNoneDefault:NoneSampling probability for each mixer by name (will be normalized to sum to 1.0). Only used when sampling_strategy='weighted'. If None with uniform strategy, all mixers have equal probability.
seed_shift—optional-
Type:
intDefault:501_974_169_931_277_706_872_159_392_843Seed shift for mixer sampling reproducibility.