INDEX
Explanations
references to leadership and authority figures in various contexts
New Auto-Interp
Negative Logits
agne
-0.17
deaux
-0.16
Margins
-0.16
eways
-0.15
irsch
-0.15
PEND
-0.14
ifo
-0.14
bens
-0.14
ording
-0.14
utterstock
-0.14
POSITIVE LOGITS
senior
0.36
leadership
0.34
highest
0.31
officials
0.29
higher
0.28
leaders
0.28
high
0.27
top
0.27
leaders
0.25
higher
0.25
Activations Density 0.288%