INDEX
Explanations
The neuron activates on occurrences of the second‐person pronoun “you.”
New Auto-Interp
Negative Logits
qtt
-0.08
imulation
-0.07
ertainment
-0.07
foss
-0.07
If
-0.06
ewart
-0.06
lay
-0.06
켜
-0.06
labels
-0.06
catalogs
-0.06
POSITIVE LOGITS
uy�
0.08
iế
0.07
vote
0.06
UNICODE
0.06
we
0.06
Expand
0.06
Ě
0.06
ندي
0.06
STE
0.06
VERBOSE
0.06
Activations Density 0.081%