INDEX
Explanations
references to education and examination-related context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.43
1.8%
674
+0.27
1.1%
1177
+0.11
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.43
0.08
1177
+0.27
0.04
227
+0.11
0.08
Negative Logits
Ant
-0.71
em
-0.68
Tras
-0.67
"
-0.67
Pat
-0.67
Por
-0.66
Em
-0.66
"
-0.66
Rub
-0.66
Mas
-0.66
POSITIVE LOGITS
inev
2.77
increa
2.76
inappro
2.74
impra
2.71
encomp
2.70
desir
2.68
indestru
2.67
effe
2.67
depic
2.66
ftu
2.66
Activations Density 0.754%