INDEX
Explanations
references or mentions of hints or clues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1983
+0.16
0.5%
395
+0.13
0.5%
204
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1983
+0.16
0.02
395
+0.13
0.03
204
+0.13
0.02
Negative Logits
Modèle
-0.69
kona
-0.62
Diction
-0.61
Equipe
-0.59
Ouv
-0.59
Équipe
-0.58
médic
-0.57
Chambres
-0.57
Dimen
-0.56
Imaginary
-0.56
POSITIVE LOGITS
hint
1.40
hints
1.29
hint
1.16
clue
1.15
clues
1.09
hinting
1.08
hinted
1.02
Hint
0.99
hints
0.95
Hint
0.95
Activations Density 0.086%