INDEX
Explanations
instances of public statements or actions
instances of the word "publicly."
New Auto-Interp
Negative Logits
nesota
-0.99
nian
-0.85
ners
-0.72
IER
-0.71
NER
-0.70
Upper
-0.68
Guy
-0.68
Chaser
-0.67
ľ
-0.66
Trem
-0.66
POSITIVE LOGITS
shaming
0.93
humiliated
0.92
reprim
0.90
humili
0.87
isable
0.86
ised
0.86
traded
0.84
denounced
0.83
apologized
0.82
repud
0.81
Activations Density 0.028%