INDEX
Explanations
numeric values and measurements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
186
+0.18
1.0%
307
+0.15
0.9%
237
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
307
+0.18
0.04
82
+0.15
0.04
136
+0.12
-0.00
Negative Logits
compelling
-1.55
ço
-1.35
messenger
-1.34
á̏
-1.33
authority
-1.29
oj
-1.27
City
-1.27
mandate
-1.25
”:
-1.25
vm
-1.24
POSITIVE LOGITS
Īĺ
3.52
ĥ½
2.69
«
2.60
ª
2.48
ĥ
2.46
↵
2.44
č↵č↵
2.44
2.44
↵↵
2.44
2.44
Activations Density 0.545%