Utils
Functions:
-
calculate_advantage
–Calculate advantage values for a row of data.
-
calculate_reward_with_implicit_kl
–Calculate reward with implicit KL penalty.
-
masked_mean
–Compute mean of tensor with a masked values.
-
masked_sum
–Compute sum of tensor with a masked values.
-
replace_dataset_column
–Replace a column in the dataset with a new column.
calculate_advantage(row)
Calculate advantage values for a row of data.
Parameters:
-
row
(dict
) –Dictionary containing rewards and statistics with keys:
- rewards: List of reward values
- reward_mean: Mean reward value
- reward_std: Standard deviation of rewards
Returns:
-
list[float]
–List of advantage values calculated as (reward - mean)/(std + eps) where eps=1e-4 is added for numerical stability
Source code in tapeagents/finetune/rl/utils.py
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
calculate_reward_with_implicit_kl(row, reward_minus_kl_coef)
Calculate reward with implicit KL penalty.
Parameters:
-
row
(dict
) –Dictionary containing reward and log probability data with keys:
- reward: Base reward value
- old_logprobs: Log probabilities from old policy
- ref_logprobs: Reference log probabilities
-
reward_minus_kl_coef
(float
) –Coefficient for implicit KL penalty term
Returns:
-
float
–Reward value adjusted by implicit KL penalty, calculated as: reward - reward_minus_kl_coef * KL(ref||old) The KL divergence is approximated using the Schulman approximation: KL ≈ exp(log_ratio) - log_ratio - 1 where log_ratio = ref_logprobs - old_logprobs
Source code in tapeagents/finetune/rl/utils.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
masked_mean(values, mask, axis=None)
Compute mean of tensor with a masked values.
Source code in tapeagents/finetune/rl/utils.py
41 42 43 44 45 46 |
|
masked_sum(values, mask, axis=None)
Compute sum of tensor with a masked values.
Source code in tapeagents/finetune/rl/utils.py
33 34 35 36 37 38 |
|
replace_dataset_column(dataset, column_name, new_column)
Replace a column in the dataset with a new column.
Source code in tapeagents/finetune/rl/utils.py
97 98 99 100 101 102 103 104 105 |
|