INDEX
Explanations
references to global issues or phenomena
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.20
1.1%
29
+0.14
0.8%
266
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
29
+0.20
0.02
429
+0.14
0.02
266
+0.11
0.02
Negative Logits
ository
-1.91
rir
-1.70
arbitrary
-1.59
sacrifice
-1.58
fficient
-1.53
lla
-1.52
died
-1.50
apest
-1.49
heets
-1.48
lessly
-1.48
POSITIVE LOGITS
ķ
2.49
Ĺ
2.31
Ł
2.25
ĵ
2.17
ĺ
2.15
Ħ
2.13
£
2.03
°
2.03
µ
1.94
isation
1.93
Activations Density 0.067%