INDEX
Explanations
The neuron specifically fires on the standalone word “Our,” especially when it appears as the first token of a segment.
New Auto-Interp
Negative Logits
tgl
-0.07
exampleInputEmail
-0.06
locale
-0.06
двор
-0.06
.raise
-0.06
셔
-0.06
usercontent
-0.06
�
-0.06
.argmax
-0.06
.Parcelable
-0.06
POSITIVE LOGITS
Our
0.08
Our
0.07
inconsistent
0.06
µ
0.06
mother
0.06
names
0.06
My
0.06
침
0.06
sponsor
0.06
아서
0.06
Activations Density 0.007%