INDEX
Explanations
the word "one" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.16
0.7%
1622
+0.05
0.2%
382
+0.04
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
776
+0.16
0.04
1622
+0.05
0.04
1874
+0.04
0.03
Negative Logits
<bos>
-2.14
serve
-0.73
/*
-0.72
continue
-0.71
public
-0.71
-0.70
struct
-0.70
/**
-0.70
}{||-0.69
,
-0.69
POSITIVE LOGITS
maneu
2.15
increa
2.12
affor
2.11
fta
2.09
guarante
2.08
stockholm
2.08
aen
2.07
lidl
2.06
squa
2.04
secon
2.03
Activations Density 0.171%