INDEX
Explanations
instances where a document contains a title or heading
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.26
1.0%
1150
+0.10
0.4%
16
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1150
+0.26
0.03
16
+0.10
0.06
227
+0.09
0.07
Negative Logits
<bos>
-3.02
ⓧ
-0.81
EndProject
-0.77
fristi
-0.73
enderror
-0.73
else
-0.72
__).
-0.71
delwed
-0.70
have
-0.69
</table>
-0.69
POSITIVE LOGITS
accla
1.88
Juf
1.85
affor
1.73
Minang
1.72
increa
1.67
hcm
1.66
stockholm
1.66
inev
1.66
reluct
1.66
lidl
1.66
Activations Density 0.801%