INDEX
Explanations
references to conservative political ideology
New Auto-Interp
Negative Logits
eger
-0.08
soever
-0.08
lé
-0.07
scribe
-0.07
lights
-0.07
vez
-0.07
lify
-0.07
lum
-0.07
illery
-0.07
åĦ¿
-0.07
POSITIVE LOGITS
-leaning
0.10
allee
0.08
irts
0.07
/right
0.07
ardin
0.07
ischer
0.06
undle
0.06
/lib
0.06
ç·Ĵ
0.06
andler
0.06
Activations Density 0.011%