INDEX
Explanations
sentences with a combination of specific verbs and pronouns
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1892
+0.11
0.5%
1334
+0.10
0.4%
994
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
24
+0.11
0.11
1776
+0.10
0.09
1334
+0.10
0.08
Negative Logits
<bos>
-2.47
.
-0.66
,
-0.60
кло
-0.54
?
-0.53
(
-0.53
launched
-0.52
:
-0.52
established
-0.52
revealed
-0.51
POSITIVE LOGITS
maroc
1.26
bandung
1.24
affez
1.23
ananas
1.18
cioc
1.16
sentra
1.11
venuto
1.09
ristor
1.07
kokos
1.06
swarovski
1.05
Activations Density 6.050%