INDEX
Explanations
fluctuations in rankings or scores
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
612
+0.09
0.3%
453
+0.09
0.3%
906
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
81
+0.09
0.02
184
+0.09
0.02
861
+0.08
0.04
Negative Logits
jectures
-0.60
letteratura
-0.60
religione
-0.59
RuntimeError
-0.58
IOError
-0.58
STRUCTIONS
-0.58
AttributeError
-0.57
scienza
-0.56
defaultProps
-0.56
zucca
-0.56
POSITIVE LOGITS
minimalis
0.64
Whoosh
0.63
intrigu
0.62
shenan
0.62
utaf
0.60
Punj
0.60
Mhm
0.60
flexível
0.59
ikiwa
0.59
apprehen
0.58
Activations Density 0.385%