INDEX
Explanations
phrases related to things or people being unpopular
references to unpopularity
New Auto-Interp
Negative Logits
chn
-0.90
urers
-0.85
ramid
-0.83
uther
-0.75
CONCLUS
-0.74
ult
-0.73
EStreamFrame
-0.71
hens
-0.71
tein
-0.71
anwhile
-0.70
POSITIVE LOGITS
unpopular
1.23
ity
1.20
incumbent
0.88
ities
0.87
liest
0.83
burdens
0.83
disadvant
0.78
lihood
0.73
partisan
0.73
plag
0.72
Activations Density 0.014%