INDEX
Explanations
phrases indicating deserving recognition or credit
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.16
0.7%
1339
+0.10
0.4%
812
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
260
+0.16
0.03
996
+0.10
0.03
1505
+0.09
0.03
Negative Logits
<bos>
-2.67
ⓧ
-0.80
public
-0.66
AppCompatTheme
-0.65
via
-0.65
-0.64
CreateIndex
-0.62
-0.62
implement
-0.62
newnode
-0.61
POSITIVE LOGITS
affor
1.70
maneu
1.65
wien
1.64
stockholm
1.62
squa
1.61
increa
1.60
lele
1.60
Juf
1.58
mcdonald
1.56
unden
1.54
Activations Density 0.227%