INDEX
Explanations
prominent individuals and their political affiliations or actions
New Auto-Interp
Negative Logits
isp
-0.16
strup
-0.15
aft
-0.15
imp
-0.14
reuse
-0.14
-toggler
-0.14
uy
-0.14
662
-0.13
bons
-0.13
GOR
-0.13
POSITIVE LOGITS
รม
0.15
r
0.15
ordion
0.15
Ø´Ùħ
0.14
.getClient
0.14
нед
0.14
rond
0.14
eum
0.13
oningen
0.13
:r
0.13
Activations Density 0.035%