| LanguageModelDPOLossConfig |
Direct Preference Optimization (DPO) loss for alignment |
| LanguageModelDistillationLossConfig |
|
| LanguageModelGRPOLossConfig |
Group-Relative Policy Optimization: per-token IS-ratio clipping |
| LanguageModelGSPOLossConfig |
Group Sequence Policy Optimization: sequence-level geometric-mean IS-ratio clipping |
| LanguageModelLabelEntropyLossConfig |
|
| LanguageModelLossConfig (abstract) |
|
| LanguageModelPolicyGradientLossConfig (abstract) |
|
| LanguageModelZLossConfig |
Z-loss regularization to prevent overconfidence |