LanguageModelEmbeddingsConfig¶
Module: fast_llm.layers.language_model.config
Inherits from: BlockConfig, ModuleConfig
Fields¶
num_position_embeddings—architecture-
Type:
intDefault:2048Number of absolute position embeddings, if applicable.
position_embeddings—architecture-
Type: OptionalParameterConfig Default: (sub-fields optional)
Configuration for the word embedding (weight).
vocab_size—architecture-
Type:
intDefault:49152Size of the vocabulary, i.e., number of vocabulary embeddings and logits.
word_embeddings—architecture-
Type: ParameterConfig Default: (sub-fields optional)
Configuration for the word embedding (weight).
dropout—feature-
Type:
floatDefault:0.0Dropout applied to the embedding layer.
lr_scale—feature-
Type:
floatorNoneDefault:NoneScaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.
full_precision_residual—stability-
Type:
boolDefault:FalseStore the residuals for the model in full precision (
optimization_dtype). vocab_parallel—performance-
Type:
boolDefault:TrueAllow for tensor-parallel vocabulary embeddings and output weights. Disable to allow for sequence-tensor-parallel input tokens, logits and cross-entropy computation. The sequence-tensor-parallel version typically runs faster, but may incur a small memory cost. Affects RNG for initialization and dropout.