MultiModalBaseModelConfig¶
Module: fast_llm.models.multimodal.config
Inherits from: VisionMultiModalModelConfig, GPTBaseModelConfig, LanguageModelConfig
Fields¶
decoder—architecture-
Type: BlockSequenceConfig Default: (sub-fields optional)
Configuration for the language model decoder.
embeddings—architecture-
Type: LanguageModelEmbeddingsConfig Default: (sub-fields optional)
Configuration for the language model embeddings.
head—architecture-
Type: LanguageModelHeadConfig Default: (sub-fields optional)
Configuration for the language model head(s).
hidden_size—architecture-
Type:
intDefault:1024Size of the model's main hidden dimension, e.g., for its input and output layers.
peft—architecture-
Type: PeftConfig Default: (sub-fields optional)
Configuration for parameter-efficient fine tuning.
tied_embedding_weight—architecture-
Type:
boolDefault:FalseTie the output weights (logits) with the vocabulary embedding.
vision_encoder—architecture-
Type: VisionEncoderConfig Default: (sub-fields optional)
Configuration for the vision encoder.
image_token_index—optional-
Type:
intorNoneDefault:NoneIndex of the image token. Unused, but required for Hugging Face conversion.
lr_scale—feature-
Type:
floatorNoneDefault:NoneScaling factor for the layer learning rate. Combines multiplicatively with the scale set by the parent and child layers, if applicable.