INDEX
Explanations
pronouns, possessive pronouns, and personal relationships in a text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.10
0.3%
1842
+0.10
0.3%
394
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.10
0.08
343
+0.10
0.06
422
+0.08
0.04
Negative Logits
lele
-1.46
Keny
-1.35
makro
-1.35
pixabay
-1.34
hcm
-1.31
Confe
-1.29
kram
-1.29
Augu
-1.28
mef
-1.27
Telex
-1.27
POSITIVE LOGITS
whom
0.71
<bos>
0.63
träglich
0.59
mnie
0.58
Vielleicht
0.57
whom
0.56
сюда
0.53
obicei
0.53
come
0.53
Faites
0.52
Activations Density 0.599%