INDEX
Explanations
words related to political activities or positions
New Auto-Interp
Negative Logits
esville
-0.72
jon
-0.62
rooms
-0.62
adem
-0.61
given
-0.61
bows
-0.60
accompanied
-0.59
lucky
-0.58
lore
-0.57
lift
-0.57
POSITIVE LOGITS
ACA
1.07
AR
1.07
ETS
1.03
ALT
1.02
EN
1.01
OPS
1.00
CO
1.00
ELS
0.99
IK
0.99
KA
0.99
Activations Density 0.088%