INDEX
Explanations
expressions of shame or feelings of guilt
New Auto-Interp
Negative Logits
Yorker
-0.45
Dix
-0.44
UCB
-0.42
relationship
-0.40
във
-0.40
Dodo
-0.40
Dory
-0.39
Yorkers
-0.39
politico
-0.39
ftu
-0.38
POSITIVE LOGITS
Shame
1.19
Shame
1.09
shame
1.05
shame
1.00
honte
0.82
shameful
0.81
shaming
0.79
ashamed
0.73
hame
0.65
autorytatywna
0.60
Activations Density 0.006%