INDEX
Explanations
words related to mistakes or errors
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1350
+0.13
0.4%
468
+0.10
0.4%
1548
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
36
+0.13
0.03
2000
+0.10
0.03
468
+0.10
0.03
Negative Logits
guarante
-0.75
depic
-0.75
alre
-0.71
attemp
-0.68
endeavouring
-0.68
intersper
-0.68
unspeak
-0.67
?...
-0.67
ineffec
-0.65
impractica
-0.64
POSITIVE LOGITS
wrong
1.01
wrong
0.98
Wrong
0.97
Wrong
0.91
WRONG
0.79
WRONG
0.77
OMITBAD
0.65
Geplaatst
0.64
AssemblyCulture
0.63
wrongs
0.63
Activations Density 0.067%