INDEX
Explanations
mysteries or unidentified phenomena
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1482
+0.12
0.4%
1103
+0.11
0.4%
468
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1443
+0.12
0.02
1103
+0.11
0.02
1482
+0.10
0.02
Negative Logits
handels
-0.68
durs
-0.62
vern
-0.61
bunda
-0.60
donat
-0.60
treff
-0.59
erk
-0.59
folle
-0.58
koc
-0.58
hek
-0.57
POSITIVE LOGITS
mystery
1.18
mysteries
1.05
mystery
1.04
Mystery
1.03
Mystery
1.02
mysterious
0.85
Mysteries
0.79
Mysterious
0.76
Mysterious
0.71
rencont
0.68
Activations Density 0.087%