INDEX
Explanations
possessive pronouns and their associations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.13
0.8%
391
+0.13
0.8%
51
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
362
+0.13
0.09
51
+0.13
0.08
191
+0.12
0.08
Negative Logits
aire
-1.66
their
-1.62
arium
-1.61
each
-1.58
him
-1.54
where
-1.54
break
-1.50
between
-1.49
these
-1.48
wise
-1.47
POSITIVE LOGITS
own
3.09
Majesty
2.32
panic
2.13
wife
2.11
sing
1.99
Own
1.93
Excell
1.92
hometown
1.88
elf
1.84
girlfriend
1.81
Activations Density 0.414%