INDEX
Explanations
proper names and specific terms mentioned in the text passages
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.10
0.4%
822
+0.05
0.2%
1575
+0.05
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
822
+0.10
0.06
132
+0.05
0.05
179
+0.05
0.04
Negative Logits
<bos>
-1.61
-0.71
<?
-0.68
//
-0.65
/***
-0.65
-0.65
ുറ
-0.64
-0.64
public
-0.63
-0.63
POSITIVE LOGITS
maneu
1.86
stockholm
1.80
affor
1.79
Known
1.69
increa
1.67
accla
1.63
milf
1.63
sappi
1.59
peppa
1.58
known
1.58
Activations Density 0.084%