INDEX
Explanations
phrases indicating uncertainty or possibility
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
453
+0.15
0.4%
1343
+0.13
0.4%
856
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1473
+0.15
0.03
1801
+0.13
0.03
838
+0.09
0.04
Negative Logits
Juf
-1.48
Rine
-1.38
Intere
-1.36
unlaw
-1.34
impractica
-1.29
McLaugh
-1.28
disagre
-1.24
Illus
-1.23
reluct
-1.23
Shakspeare
-1.22
POSITIVE LOGITS
Bourgoin
0.75
tré
0.74
Baillargeon
0.72
famí
0.70
utop
0.69
FetchType
0.68
Kohlen
0.67
lade
0.67
PicClick
0.66
Paglinawan
0.66
Activations Density 0.320%