INDEX
Explanations
references to conservative ideologies and political movements
New Auto-Interp
Negative Logits
aday
-0.15
622
-0.15
jee
-0.15
ose
-0.14
ee
-0.14
olum
-0.14
-less
-0.14
ral
-0.14
ings
-0.14
verage
-0.14
POSITIVE LOGITS
-leaning
0.28
/lib
0.23
/left
0.18
credentials
0.16
ischer
0.16
wing
0.16
lero
0.15
-minded
0.15
jerne
0.15
assin
0.15
Activations Density 0.066%