Skip to content

FimConfig

Module: fast_llm.data.dataset.gpt.config

Fields

ratecore

Type: float    Default: 0.0

FIM rate for each sample.

fragment_ratefeature

Type: float    Default: 0.0

FIM rate for each fragment when using fim_split_sample.

ignore_prefixfeature

Type: str or None    Default: None

Do not apply FIM to fragments that start with this prefix.

max_middle_lenfeature

Type: int or None    Default: None

Maximum length of the middle segment in FIM.

middle_tokenfeature

Type: str    Default: "<fim_middle>"

TODO.

pad_tokenfeature

Type: str    Default: "<fim_pad>"

TODO.

prefix_tokenfeature

Type: str    Default: "<fim_prefix>"

TODO.

split_samplefeature

Type: str or None    Default: None

Split samples on this token and permute each fragment separately.

spm_ratefeature

Type: float    Default: 0.5

TODO.

suffix_tokenfeature

Type: str    Default: "<fim_suffix>"

TODO.

tokenizerfeature

Type: TokenizerConfig    Default: (sub-fields optional)

Configuration for the tokenizer.

truncate_or_padfeature

Type: bool    Default: False

TODO.