INDEX
Explanations
references to products or services in a corporate context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1438
+0.10
0.3%
227
+0.10
0.3%
1385
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.10
0.08
1438
+0.10
0.06
908
+0.10
0.04
Negative Logits
inev
-1.79
unlaw
-1.70
desir
-1.69
volunte
-1.69
fep
-1.67
depic
-1.66
reluct
-1.66
ftu
-1.64
accla
-1.64
thut
-1.64
POSITIVE LOGITS
.
0.94
;
0.82
<bos>
0.76
:
0.75
。
0.74
,
0.73
properly
0.70
without
0.70
;
0.69
↵↵
0.69
Activations Density 0.706%