INDEX
Explanations
apologies or instances of someone publicly expressing regret for their actions
instances of public apologies
New Auto-Interp
Negative Logits
Downloadha
-0.84
corn
-0.71
arnaev
-0.70
aida
-0.69
uana
-0.69
weeney
-0.68
production
-0.66
tails
-0.66
adj
-0.66
eu
-0.65
POSITIVE LOGITS
unres
1.28
apologized
1.07
apologize
1.03
sincerely
1.02
prof
0.99
apology
0.97
giving
0.91
apologise
0.90
apologizing
0.90
apologised
0.90
Activations Density 0.048%