INDEX
Explanations
high-ranking political and leadership positions
New Auto-Interp
Head Attr Weights
0:0.06
1:0.01
2:0.08
3:0.07
4:0.20
5:0.03
6:0.21
7:0.07
8:0.05
9:0.04
10:0.06
11:0.07
Negative Logits
Dip
-1.49
aan
-1.43
amy
-1.41
fingert
-1.39
osal
-1.38
MPH
-1.36
ocent
-1.35
palm
-1.35
romy
-1.34
zers
-1.34
POSITIVE LOGITS
publishes
1.42
refres
1.41
unlocks
1.35
promptly
1.34
の魔
1.32
rified
1.30
]),
1.30
ibly
1.30
holiday
1.30
crashed
1.29
Activations Density 0.001%