INDEX
Explanations
dialogue and interactions between characters in a storyline
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.24
0.7%
2015
+0.11
0.3%
906
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
319
+0.24
0.03
1415
+0.11
0.02
1470
+0.11
0.03
Negative Logits
optik
-1.08
akut
-0.93
silikon
-0.91
alkoh
-0.91
kosme
-0.91
antik
-0.89
makro
-0.88
lele
-0.84
kask
-0.83
gend
-0.83
POSITIVE LOGITS
<bos>
1.79
Cześć
0.53
scoper
0.53
welkom
0.52
Dziękuję
0.52
zprávy
0.51
mwenye
0.51
úplně
0.51
sphase
0.50
fazia
0.49
Activations Density 0.232%