LanguageModelEmbeddingsConfig¶

Module: fast_llm.layers.language_model.config

Inherits from: BlockConfig, ModuleConfig

Fields¶

embedding_scale — architecture

Type: float Default: 1.0

Multiplicative scale applied to word embeddings after lookup.

num_position_embeddings — architecture

Type: int Default: 2048

Number of absolute position embeddings, if applicable.

position_embeddings — architecture

Type: OptionalParameterConfig Default: (sub-fields optional)

Configuration for the word embedding (weight).

vocab_size — architecture

Type: int Default: 49152

Size of the vocabulary, i.e., number of vocabulary embeddings and logits.

word_embeddings — architecture

Type: ParameterConfig Default: (sub-fields optional)

Configuration for the word embedding (weight).

dropout — feature

Type: float Default: 0.0

Dropout applied to the embedding layer.

lr_scale — feature

Type: float or None Default: None

Scaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.

full_precision_residual — stability

Type: bool Default: False

Store the residuals for the model in full precision (optimization_dtype).

vocab_parallel — performance

Type: bool Default: True

Allow for tensor-parallel vocabulary embeddings and output weights. Disable to allow for sequence-tensor-parallel input tokens, logits and cross-entropy computation. The sequence-tensor-parallel version typically runs faster, but may incur a small memory cost. Affects RNG for initialization and dropout.

LanguageModelEmbeddingsConfig¶

Fields¶

Used in¶