INDEX
Explanations
error messages in a specific format
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
876
+0.18
0.5%
468
+0.10
0.3%
906
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
876
+0.18
-0.00
1583
+0.10
0.02
1005
+0.09
0.03
Negative Logits
<bos>
-0.78
by
-0.62
so
-0.61
-0.61
,
-0.59
随着
-0.56
and
-0.56
via
-0.56
победы
-0.56
through
-0.55
POSITIVE LOGITS
napoli
1.74
milano
1.72
nutr
1.71
sappi
1.64
thermomix
1.60
wien
1.59
dispen
1.58
mef
1.57
stockholm
1.57
tanga
1.54
Activations Density 0.180%