INDEX
Explanations
instances where regret is expressed
expressions of regret or remorse
New Auto-Interp
Negative Logits
ĪĴ
-0.73
rigs
-0.71
place
-0.70
uana
-0.69
icles
-0.69
icle
-0.67
indo
-0.66
dotted
-0.61
emonic
-0.61
paio
-0.61
POSITIVE LOGITS
fully
1.15
ful
1.04
fulness
0.98
FUL
0.90
faced
0.86
regrets
0.85
imaru
0.84
regret
0.83
rence
0.80
vier
0.80
Activations Density 0.014%