INDEX
Explanations
adjectives or phrases expressing surprise or disappointment
expressions of regret or concern about societal issues
New Auto-Interp
Negative Logits
exting
-0.75
qus
-0.74
iHUD
-0.71
jong
-0.70
semble
-0.70
pione
-0.69
ivalent
-0.69
pleting
-0.69
ignty
-0.68
edom
-0.68
POSITIVE LOGITS
they
0.89
nobody
0.86
we
0.82
why
0.81
THEY
0.77
that
0.77
everyone
0.73
adays
0.70
people
0.70
he
0.69
Activations Density 0.144%