INDEX
Explanations
multiple references to academic or official documents and reports
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.24
1.4%
287
+0.11
0.7%
376
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
287
+0.24
0.01
315
+0.11
0.01
198
+0.10
0.01
Negative Logits
ĥ½
-3.34
ı
-2.64
§
-2.63
↵
-2.37
↵↵
-2.37
-2.37
<|outofrange|>
-2.37
-2.37
-2.37
-2.37
POSITIVE LOGITS
matter
1.71
sense
1.71
iary
1.60
area
1.58
flash
1.57
ious
1.50
lot
1.49
mortem
1.46
plate
1.42
boarding
1.42
Activations Density 0.016%