INDEX
Explanations
This neuron activates on the relative possessive pronoun “whose.”
New Auto-Interp
Negative Logits
ail
-0.08
it
-0.07
they
-0.07
I
-0.07
It
-0.07
میدان
-0.06
ia
-0.06
']}}</
-0.06
_apply
-0.06
it
-0.06
POSITIVE LOGITS
whose
0.17
whose
0.15
Who
0.07
자의
0.07
(named
0.07
Your
0.07
DOE
0.07
,把
0.07
jehož
0.07
who
0.07
Activations Density 0.008%