INDEX
Explanations
personal pronouns and possessive pronouns
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1553
+0.09
0.3%
576
+0.07
0.2%
1460
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1553
+0.09
0.05
284
+0.07
0.05
234
+0.07
0.03
Negative Logits
attes
-0.67
relativi
-0.63
hoj
-0.59
Février
-0.59
liev
-0.57
incess
-0.57
quí
-0.56
Novembre
-0.55
suscit
-0.55
inverte
-0.55
POSITIVE LOGITS
into
0.84
away
0.76
onto
0.72
back
0.69
toward
0.62
forward
0.62
towards
0.61
astray
0.59
ashore
0.57
down
0.57
Activations Density 0.286%