INDEX
Explanations
references to academic societies
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.09
0.4%
893
+0.07
0.3%
1565
+0.06
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1456
+0.09
0.04
1574
+0.07
0.04
1738
+0.06
0.03
Negative Logits
<bos>
-1.71
var
-0.76
hline
-0.76
continue
-0.76
colspan
-0.74
.
-0.73
export
-0.73
also
-0.73
remain
-0.72
-0.72
POSITIVE LOGITS
socie
2.49
increa
2.19
maneu
2.16
affor
2.15
stockholm
2.08
impra
2.06
strick
2.05
aen
2.04
wien
2.02
embodi
2.00
Activations Density 0.111%