INDEX
Explanations
words related to political control and power
references to political control and authority
New Auto-Interp
Negative Logits
idered
-0.68
Recommend
-0.67
Tale
-0.67
asca
-0.60
Adventures
-0.59
uni
-0.59
pheus
-0.59
enegger
-0.58
Honour
-0.58
rehend
-0.57
POSITIVE LOGITS
eering
0.89
orship
0.89
ership
0.85
control
0.82
lessness
0.79
taker
0.78
ignty
0.77
ANCE
0.77
iveness
0.77
levers
0.75
Activations Density 0.054%