INDEX
Explanations
instances of apologies or requests for apologies in text
words related to apologies and expressions of remorse
New Auto-Interp
Negative Logits
weeney
-0.88
arnaev
-0.80
picking
-0.76
marked
-0.75
population
-0.73
markets
-0.72
::::::::
-0.71
growth
-0.70
aic
-0.66
ulhu
-0.65
POSITIVE LOGITS
apologized
0.97
apology
0.94
unres
0.93
apologize
0.91
apologised
0.89
giving
0.88
apologizing
0.85
apologies
0.84
forgiveness
0.81
apologise
0.81
Activations Density 0.027%