INDEX
Explanations
explanations or descriptions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
699
+0.17
0.6%
397
+0.15
0.6%
31
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
699
+0.17
0.06
397
+0.15
0.04
31
+0.14
0.04
Negative Logits
maksi
-0.82
seksi
-0.80
silikon
-0.80
kado
-0.79
keramik
-0.76
akut
-0.76
kafe
-0.74
lele
-0.73
krim
-0.73
tomat
-0.72
POSITIVE LOGITS
explain
1.17
explanations
1.05
explanation
1.04
explaining
1.02
Explain
1.01
Explain
1.01
explains
1.00
explain
0.98
explained
0.97
why
0.83
Activations Density 0.109%