INDEX
Explanations
words related to control and authority
instances of the word "control" and its context, indicating concepts related to power and authority
New Auto-Interp
Negative Logits
uum
-0.74
tom
-0.68
Mich
-0.67
osp
-0.67
hiba
-0.67
ople
-0.65
ithing
-0.64
istle
-0.64
hire
-0.64
yond
-0.63
POSITIVE LOGITS
levers
0.84
controlled
0.82
ively
0.79
steering
0.78
eering
0.76
contro
0.75
controlling
0.71
control
0.68
Controlled
0.68
captive
0.66
Activations Density 0.037%