INDEX
Explanations
words related to political systems and government activities
New Auto-Interp
Negative Logits
gur
-0.76
yz
-0.72
arta
-0.70
lain
-0.68
ingham
-0.64
ZA
-0.61
pelling
-0.60
tek
-0.59
LV
-0.59
ammy
-0.58
POSITIVE LOGITS
rils
1.22
entious
1.14
ril
0.97
toward
0.96
entimes
0.91
towards
0.88
grav
0.80
erest
0.76
erer
0.75
favour
0.75
Activations Density 0.018%