INDEX
Explanations
apologies
expressions of apology
New Auto-Interp
Negative Logits
Sutton
-0.69
stead
-0.69
leaf
-0.67
Led
-0.67
Gad
-0.65
Ele
-0.64
ston
-0.63
favoured
-0.62
ed
-0.62
combination
-0.61
POSITIVE LOGITS
apologize
4.03
apologise
3.16
apologized
2.69
apologizing
2.53
apologies
2.37
apology
2.12
apologised
1.94
apolog
1.67
apolog
1.40
repent
1.31
Activations Density 0.014%