INDEX
Explanations
references to scientific studies and research findings
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.12
0.4%
198
+0.10
0.3%
964
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
73
+0.12
0.02
184
+0.10
0.01
1397
+0.09
0.01
Negative Logits
<^
-0.58
uestions
-0.55
oO
-0.53
|]
-0.53
Corresponding
-0.52
Faktor
-0.52
Sot
-0.51
:,,
-0.50
«<
-0.48
sii
-0.47
POSITIVE LOGITS
relenting
1.00
sightly
0.94
mistak
0.83
wavering
0.80
thinkable
0.79
vestiti
0.73
affez
0.73
<bos>
0.72
fidanz
0.70
sentito
0.69
Activations Density 0.327%