INDEX
Explanations
political and authoritarian-themed terms and actions
New Auto-Interp
Negative Logits
éĹĺ
-0.73
kHz
-0.69
hower
-0.67
terday
-0.66
ignty
-0.66
apprehension
-0.62
cob
-0.61
kHz
-0.60
Ae
-0.60
Siem
-0.59
POSITIVE LOGITS
gers
1.39
glers
1.31
rett
1.21
mented
1.13
rant
1.12
ging
1.11
gy
1.08
mentation
1.08
hett
1.05
herer
1.04
Activations Density 4.489%