INDEX
Explanations
The neuron selectively activates on personal pronouns (especially “I” and “you”) in conversational or dialogue‐style text.
New Auto-Interp
Negative Logits
GO
-0.07
ころ
-0.06
Man
-0.06
Negro
-0.06
Applicants
-0.06
section
-0.06
taxonomy
-0.06
มาร
-0.06
Livingston
-0.06
_IDENTIFIER
-0.06
POSITIVE LOGITS
เหม
0.07
points
0.06
rookies
0.06
prázd
0.06
잡담
0.06
pb
0.06
pública
0.06
المج
0.06
geçti
0.06
olmam
0.06
Activations Density 0.007%