INDEX
Explanations
HTML and CSS class attributes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.13
0.4%
919
+0.09
0.3%
453
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1989
+0.13
0.02
1478
+0.09
0.02
724
+0.08
0.02
Negative Logits
<bos>
-0.86
,
-0.78
-0.77
…
-0.76
.
-0.75
(
-0.74
since
-0.74
-0.74
between
-0.73
to
-0.72
POSITIVE LOGITS
unce
2.02
squa
2.00
unden
2.00
scrat
1.98
increa
1.97
affor
1.92
michelin
1.92
quoique
1.91
guarante
1.91
?...
1.88
Activations Density 0.043%