INDEX
Explanations
titles or headings within a text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
168
+0.19
0.7%
1034
+0.14
0.5%
1482
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
168
+0.19
0.04
442
+0.14
0.03
1034
+0.13
0.03
Negative Logits
ftu
-0.91
tranf
-0.90
ftre
-0.90
frastructure
-0.89
confider
-0.87
perfon
-0.87
nece
-0.86
beft
-0.84
santiago
-0.84
fign
-0.81
POSITIVE LOGITS
title
1.53
title
1.44
Title
1.38
titles
1.37
TITLE
1.29
Title
1.28
TITLE
1.26
titles
1.22
Titles
1.19
Titles
1.17
Activations Density 0.072%