INDEX
Explanations
references to political or government-related entities and their influence on society
New Auto-Interp
Negative Logits
626
-0.16
anza
-0.14
ioni
-0.14
oupon
-0.14
tar
-0.14
itas
-0.13
Mae
-0.13
Mov
-0.13
elsinki
-0.13
obao
-0.13
POSITIVE LOGITS
alike
0.28
respectively
0.17
Holt
0.17
odzi
0.15
ROL
0.15
ascar
0.15
ÑĤоже
0.15
ç̬
0.14
/root
0.14
뢰
0.14
Activations Density 0.240%