INDEX
Explanations
the presence of references to leadership roles or titles
New Auto-Interp
Head Attr Weights
0:0.02
1:0.09
2:0.15
3:0.10
4:0.04
5:0.04
6:0.07
7:0.06
8:0.12
9:0.15
10:0.06
11:0.04
Negative Logits
nings
-1.34
krit
-1.33
vertisement
-1.31
mys
-1.30
eeper
-1.24
former
-1.21
fixes
-1.21
ignition
-1.16
inition
-1.15
regular
-1.14
POSITIVE LOGITS
nih
1.33
istg
1.28
Vet
1.19
Agg
1.18
ENT
1.17
️
1.14
Unch
1.14
(%)
1.12
-----
1.06
Nan
1.06
Activations Density 0.075%