INDEX
Explanations
numerical values and specific dates mentioned in a text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.14
0.8%
2019
+0.04
0.2%
98
+0.04
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
783
+0.14
0.17
805
+0.04
0.18
381
+0.04
-0.10
Negative Logits
<bos>
-1.46
ⓧ
-1.07
public
-0.98
</tbody>
-0.94
enumerate
-0.94
BUYER
-0.94
protected
-0.91
<eos>
-0.90
immer
-0.90
ുറ
-0.90
POSITIVE LOGITS
maneu
3.76
affor
3.66
increa
3.62
impra
3.40
accla
3.38
strick
3.32
reluct
3.32
depic
3.26
shenan
3.25
guarante
3.25
Activations Density 10.222%