INDEX
Explanations
words related to power and influence, particularly in political and strategic contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
341
+0.14
0.6%
1870
+0.12
0.5%
1464
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
341
+0.14
0.06
1464
+0.12
0.04
1616
+0.12
0.04
Negative Logits
geograf
-0.51
Hund
-0.48
Lewes
-0.46
リエステル
-0.46
kritis
-0.45
minimalis
-0.45
Obrázky
-0.43
Hiller
-0.43
Croydon
-0.43
Middlesbrough
-0.42
POSITIVE LOGITS
power
1.26
power
1.24
POWER
1.22
Power
1.22
POWER
1.18
Power
1.17
powers
1.06
powers
1.06
setPower
0.97
Powers
0.93
Activations Density 0.101%