INDEX
Explanations
the word "apologize" or its variations.
terms related to apologies and expressions of remorse
New Auto-Interp
Negative Logits
weeney
-0.82
arnaev
-0.81
marked
-0.80
picking
-0.74
umbers
-0.71
population
-0.68
aic
-0.67
rooms
-0.66
ulhu
-0.66
markets
-0.66
POSITIVE LOGITS
unres
1.02
giving
0.93
apologized
0.93
apology
0.91
apologize
0.88
apologised
0.88
forgiveness
0.86
apologizing
0.85
apologies
0.82
acknowled
0.81
Activations Density 0.021%