Dataset Class Distribution Analysis Config
Default value: DatasetWarningsOptions()
Some thresholds can be set to modify the number of warnings in the Dataset Class Distribution Analysis.
from pydantic import BaseModel
class DatasetWarningsOptions(BaseModel):
min_num_per_class: int = 20 # (1)
max_delta_class_imbalance: float = 0.5 # (2)
max_delta_representation: float = 0.05 # (3)
max_delta_mean_tokens: float = 3.0 # (4)
max_delta_std_tokens: float = 3.0 # (5)
- Threshold for the first set of warnings (missing samples).
- Threshold for the second set of warnings (class imbalance).
- Threshold for the third set of warnings (representation mismatch).
- Threshold for the fourth set of warnings (length mismatch).
- Threshold for the fourth set of warnings (length mismatch).