INDEX
Explanations
references to political figures and their actions or claims
New Auto-Interp
Negative Logits
illez
-0.14
andaÅŁ
-0.14
Äįin
-0.14
aza
-0.14
edula
-0.14
ocoa
-0.13
utex
-0.13
inalg
-0.13
ÄĻk
-0.13
çĴĥ
-0.13
POSITIVE LOGITS
equally
0.24
contested
0.18
sever
0.17
itmap
0.16
sight
0.16
clock
0.16
narr
0.15
relude
0.15
troop
0.15
parade
0.15
Activations Density 0.059%