INDEX
Explanations
references to specific political districts or designations
New Auto-Interp
Negative Logits
loy
-0.17
aten
-0.17
ark
-0.16
jid
-0.14
ouble
-0.14
erno
-0.14
otional
-0.14
ominator
-0.14
اÙħ
-0.14
rick
-0.13
POSITIVE LOGITS
illard
0.21
ade
0.21
eded
0.20
ocket
0.19
rex
0.19
elsea
0.18
illin
0.18
resher
0.18
ign
0.18
enny
0.18
Activations Density 0.031%