INDEX
Explanations
statements of opinion and agreement regarding social issues
New Auto-Interp
Negative Logits
elp
-0.16
pÅĻitom
-0.15
ippers
-0.15
uez
-0.15
citiz
-0.14
aco
-0.14
ills
-0.14
-divider
-0.13
ĵn
-0.13
uits
-0.13
POSITIVE LOGITS
there
0.31
There
0.23
THERE
0.20
There
0.20
there
0.19
nobody
0.18
avic
0.16
anes
0.15
no
0.15
åĩ¡
0.15
Activations Density 0.497%