INDEX
Explanations
references to political leaders and their actions
New Auto-Interp
Negative Logits
宾
-0.17
benh
-0.17
Osman
-0.17
ello
-0.16
Lans
-0.16
åĿª
-0.15
/epl
-0.15
기ìŀIJ
-0.14
anga
-0.14
oslo
-0.14
POSITIVE LOGITS
Ay
0.28
Exped
0.28
ay
0.28
Revolutionary
0.24
Ay
0.23
cler
0.23
Guards
0.22
Supreme
0.22
Guard
0.22
Guardians
0.22
Activations Density 0.044%