MultiStageConfig¶
Module: fast_llm.engine.multi_stage.config
Inherits from: StageConfig
Fields¶
full_precision_gradients—optional-
Type:
boolDefault:TrueReduce and accumulate gradients in fp32 to improve numerical stability.
store_frozen_weights_in_optimization_precision—optional-
Type:
boolDefault:TrueStore frozen weights in full precision even if not needed.Allows preserving the precision for saved checkpoints, at the cost of memory and compute (copy) overheads.
layers_per_stage—performance-
Type:
floatDefault:1.0Number of layers to include in each Fast LLM stage.
zero_stage—performance-
Type:
intorNoneDefault:NoneThe ZeRO stage.
debug_activation_memory—logging-
Type:
boolDefault:FalseLog memory usage after each layer.
debug_all_param_gradients—logging-
Type:
intDefault:0Log each parameter gradient after reduction.
debug_global_tensors—logging-
Type:
boolDefault:TrueReconstruct global tensors for debug logs (slow, uses lots of memory, does not concat sequential micro-batches).
debug_layer_gradients—logging-
Type:
intDefault:0Log the (input) gradients of each layer.
debug_layer_outputs—logging-
Type:
intDefault:0Log the output of each layer.
debug_param_gradients—logging-
Type:
intDefault:0Log the gradient shard after reduction.
debug_param_init—logging-
Type:
intDefault:0Log the parameters after initialization.
debug_param_update—logging-
Type:
intDefault:0Log the parameters after update.
debug_tensor_parallel—logging-
Type:
boolDefault:FalseCheck for tensor-parallel desyncs and log an error if a desync is found. High overhead
num_grad_buffers—expert-
Type:
intorNoneDefault:NoneNumber of stage buffer for gradients. Normally set through the ZeRO stage.
num_weight_buffers—expert-
Type:
intorNoneDefault:NoneNumber of stage buffer for weights. Normally set through the ZeRO stage.
pipeline_delay—expert-
Type:
floatDefault:0.0Estimated delay (in steps) for data to go around the pipeline, used to improve pipeline-parallel network overlap. Currently unused
stages_per_pipeline_stage—wip-
Type:
intDefault:1Number of Fast LLM stages on each pipeline stage.