Skip to content

MultiStageConfig

Module: fast_llm.engine.multi_stage.config

Inherits from: StageConfig

Fields

full_precision_gradientsoptional

Type: bool    Default: True

Reduce and accumulate gradients in fp32 to improve numerical stability.

store_frozen_weights_in_optimization_precisionoptional

Type: bool    Default: True

Store frozen weights in full precision even if not needed.Allows preserving the precision for saved checkpoints, at the cost of memory and compute (copy) overheads.

layers_per_stageperformance

Type: float    Default: 1.0

Number of layers to include in each Fast LLM stage.

zero_stageperformance

Type: int or None    Default: None

The ZeRO stage.

debug_activation_memorylogging

Type: bool    Default: False

Log memory usage after each layer.

debug_all_param_gradientslogging

Type: int    Default: 0

Log each parameter gradient after reduction.

debug_global_tensorslogging

Type: bool    Default: True

Reconstruct global tensors for debug logs (slow, uses lots of memory, does not concat sequential micro-batches).

debug_layer_gradientslogging

Type: int    Default: 0

Log the (input) gradients of each layer.

debug_layer_outputslogging

Type: int    Default: 0

Log the output of each layer.

debug_param_gradientslogging

Type: int    Default: 0

Log the gradient shard after reduction.

debug_param_initlogging

Type: int    Default: 0

Log the parameters after initialization.

debug_param_updatelogging

Type: int    Default: 0

Log the parameters after update.

debug_tensor_parallellogging

Type: bool    Default: False

Check for tensor-parallel desyncs and log an error if a desync is found. High overhead

num_grad_buffersexpert

Type: int or None    Default: None

Number of stage buffer for gradients. Normally set through the ZeRO stage.

num_weight_buffersexpert

Type: int or None    Default: None

Number of stage buffer for weights. Normally set through the ZeRO stage.

pipeline_delayexpert

Type: float    Default: 0.0

Estimated delay (in steps) for data to go around the pipeline, used to improve pipeline-parallel network overlap. Currently unused

stages_per_pipeline_stagewip

Type: int    Default: 1

Number of Fast LLM stages on each pipeline stage.

Used in