INDEX
Explanations
references to political administrations and leadership
New Auto-Interp
Negative Logits
Core
-0.14
Surv
-0.14
inya
-0.14
apon
-0.14
core
-0.14
ampo
-0.14
ÃŃch
-0.13
اÙĪØ±
-0.13
efd
-0.13
kå
-0.13
POSITIVE LOGITS
tenure
0.14
åŀ
0.14
spender
0.14
ä¼ı
0.14
-times
0.14
ylum
0.14
#ae
0.14
梯
0.14
าà¸ĵ
0.14
ç·Ĵ
0.14
Activations Density 0.087%