FimConfig¶
Module: fast_llm.data.dataset.gpt.config
Fields¶
rate—core-
Type:
floatDefault:0.0FIM rate for each sample.
fragment_rate—feature-
Type:
floatDefault:0.0FIM rate for each fragment when using fim_split_sample.
ignore_prefix—feature-
Type:
strorNoneDefault:NoneDo not apply FIM to fragments that start with this prefix.
max_middle_len—feature-
Type:
intorNoneDefault:NoneMaximum length of the middle segment in FIM.
middle_token—feature-
Type:
strDefault:"<fim_middle>"TODO.
pad_token—feature-
Type:
strDefault:"<fim_pad>"TODO.
prefix_token—feature-
Type:
strDefault:"<fim_prefix>"TODO.
split_sample—feature-
Type:
strorNoneDefault:NoneSplit samples on this token and permute each fragment separately.
spm_rate—feature-
Type:
floatDefault:0.5TODO.
suffix_token—feature-
Type:
strDefault:"<fim_suffix>"TODO.
tokenizer—feature-
Type: TokenizerConfig Default: (sub-fields optional)
Configuration for the tokenizer.
truncate_or_pad—feature-
Type:
boolDefault:FalseTODO.