INDEX
Explanations
pronouns indicating ownership or possession
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1961
+0.14
0.5%
545
+0.10
0.3%
1256
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1256
+0.14
0.02
545
+0.10
0.02
904
+0.09
0.01
Negative Logits
solidar
-0.59
Mó
-0.55
CÓ
-0.53
interessa
-0.52
talle
-0.51
sensibili
-0.51
cresce
-0.50
Referencoj
-0.50
sembler
-0.49
Cá
-0.48
POSITIVE LOGITS
MINE
0.95
Ours
0.94
Yours
0.94
ours
0.90
Mine
0.88
mine
0.86
hers
0.85
theirs
0.84
yours
0.82
Mine
0.81
Activations Density 0.076%