INDEX
Explanations
aes with a strong emotional or physical impact
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
581
+0.09
0.3%
1984
+0.08
0.2%
468
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1941
+0.09
0.02
1696
+0.08
0.02
1424
+0.08
0.04
Negative Logits
shenan
-1.42
increa
-1.35
reluct
-1.33
depic
-1.32
affor
-1.29
maneu
-1.29
?...
-1.29
inev
-1.28
unspeak
-1.27
intrigu
-1.25
POSITIVE LOGITS
anymore
0.85
nor
0.76
enderror
0.70
too
0.63
sondern
0.62
too
0.61
anything
0.57
integridad
0.56
antwortete
0.55
<bos>
0.55
Activations Density 0.474%