INDEX
Explanations
terms related to political topics and entities
New Auto-Interp
Negative Logits
ertas
-0.16
iyet
-0.16
enting
-0.16
uteur
-0.15
anja
-0.15
zej
-0.14
ónico
-0.14
511
-0.14
otland
-0.14
äll
-0.14
POSITIVE LOGITS
icians
0.28
ician
0.23
correct
0.23
correct
0.21
ical
0.21
Correct
0.21
ically
0.20
incorrect
0.20
ICS
0.20
ifact
0.19
Activations Density 0.007%