INDEX
Explanations
words related to leadership positions or titles
New Auto-Interp
Negative Logits
phis
-0.88
nikov
-0.88
lished
-0.80
lishes
-0.79
ajor
-0.74
aughs
-0.73
etimes
-0.71
Sov
-0.70
ppo
-0.70
lihood
-0.69
POSITIVE LOGITS
executive
1.11
doms
1.00
executives
0.86
Executive
0.85
iary
0.84
IAL
0.78
negotiator
0.77
culprit
0.74
rabbi
0.74
editor
0.73
Activations Density 0.051%