INDEX
Explanations
expressions of empathy or sympathy
expressions of apology or regret
New Auto-Interp
Negative Logits
minecraft
-0.79
sports
-0.77
ouver
-0.73
tein
-0.71
rouse
-0.70
hack
-0.68
tnc
-0.68
authorized
-0.67
craft
-0.66
impro
-0.64
POSITIVE LOGITS
sorry
1.37
Sorry
1.00
Sorry
0.94
sorry
0.94
excuse
0.88
pardon
0.84
apologies
0.78
soever
0.77
THANK
0.76
ा
0.76
Activations Density 0.007%