INDEX
Explanations
references to detailed information, investigations, and historical research
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.07
0.2%
623
+0.07
0.2%
924
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
193
+0.07
0.04
1293
+0.07
0.04
16
+0.07
0.05
Negative Logits
mépris
-0.70
-0.70
ingrat
-0.69
getAge
-0.68
unspeak
-0.65
quitted
-0.63
monstre
-0.62
marchand
-0.60
malheur
-0.60
pamph
-0.60
POSITIVE LOGITS
@[+][
0.69
<<<<<<<<<<<<<<
0.63
oneofs
0.53
vald
0.52
IntoConstraints
0.51
Initializable
0.51
IANGLES
0.51
дописавши
0.50
Viitteet
0.49
mtrl
0.48
Activations Density 0.402%