INDEX
Explanations
phrases related to power and authority
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
950
+0.13
0.5%
281
+0.13
0.5%
1839
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
281
+0.13
0.03
950
+0.13
0.03
650
+0.12
0.02
Negative Logits
<bos>
-0.69
adel
-0.50
незавершена
-0.50
avent
-0.48
mir
-0.48
glBind
-0.45
espar
-0.43
gado
-0.43
comod
-0.43
المناصب
-0.43
POSITIVE LOGITS
force
1.15
force
1.06
Force
1.05
forces
1.00
FORCE
1.00
Force
0.98
forces
0.96
Forces
0.96
Forces
0.94
FORCE
0.92
Activations Density 0.087%