INDEX
Explanations
references to choices or options
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
300
+0.12
0.7%
478
+0.12
0.6%
447
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
35
+0.12
0.02
511
+0.12
0.03
447
+0.11
0.03
Negative Logits
Ħ
-3.55
ľĵ
-3.44
ĺ
-3.40
¢
-3.38
ĻĤ
-3.37
Īĺ
-3.36
£
-3.33
º
-3.33
¡
-3.29
Ń
-3.27
POSITIVE LOGITS
deal
1.63
chid
1.52
else
1.51
`.
1.51
acity
1.49
inda
1.48
.'"
1.46
ações
1.45
acles
1.45
leans
1.45
Activations Density 0.511%