INDEX
Explanations
mentions of conservative ideologies and associated terms
New Auto-Interp
Negative Logits
ings
-0.16
меÑĢ
-0.15
ollar
-0.15
ONSE
-0.14
itoris
-0.14
нами
-0.14
нам
-0.14
pear
-0.13
idar
-0.13
PartialView
-0.13
POSITIVE LOGITS
-leaning
0.19
/lib
0.16
aggio
0.16
/social
0.16
unker
0.14
451
0.14
friendly
0.14
princ
0.14
-social
0.14
èģĶ
0.14
Activations Density 0.028%