INDEX
Explanations
instances of apologies or demands for apologies
instances of apologies or expressions of regret
New Auto-Interp
Negative Logits
jun
-0.87
weeney
-0.85
spot
-0.80
marked
-0.74
corn
-0.74
tail
-0.71
arnaev
-0.68
tails
-0.68
picking
-0.66
cop
-0.65
POSITIVE LOGITS
apologize
1.28
apologized
1.19
apologise
1.15
apologised
1.13
apologizing
1.09
apology
1.08
apologies
1.06
sorry
0.90
apolog
0.87
pardon
0.83
Activations Density 0.010%