INDEX
Explanations
references to nerdy culture or activities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
964
+0.18
0.5%
1013
+0.16
0.5%
198
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
964
+0.18
0.04
1013
+0.16
0.06
401
+0.11
0.03
Negative Logits
(%)
-0.59
strategia
-0.57
résult
-0.53
valutazione
-0.53
kef
-0.53
Bx
-0.52
Atm
-0.52
umo
-0.52
arxiv
-0.51
Adj
-0.51
POSITIVE LOGITS
<bos>
0.73
Bárbara
0.61
getAge
0.61
childhood
0.58
poulet
0.58
curé
0.57
damals
0.57
capitaine
0.56
Mère
0.53
getVersion
0.53
Activations Density 0.609%