INDEX
Explanations
mentions of particular political parties
New Auto-Interp
Negative Logits
loor
-0.15
ook
-0.14
PerPixel
-0.14
ypo
-0.14
+xml
-0.14
piè
-0.14
portun
-0.13
lamaz
-0.13
ollar
-0.13
ient
-0.13
POSITIVE LOGITS
eguard
0.17
arih
0.15
ascus
0.15
erta
0.14
arded
0.14
hir
0.14
umer
0.14
iki
0.14
ottom
0.13
QA
0.13
Activations Density 0.008%