INDEX
Explanations
This neuron detects direct second-person address, activating on tokens like “you,” “your,” and references to the human interlocutor.
New Auto-Interp
Negative Logits
报道
-0.07
airline
-0.07
uraa
-0.06
اتاق
-0.06
cow
-0.06
fon
-0.06
جاد
-0.06
contacto
-0.06
agascar
-0.06
dio
-0.06
POSITIVE LOGITS
wrestlers
0.06
буд
0.06
Purch
0.06
Sovere
0.06
خصوص
0.06
Emoji
0.06
.cc
0.06
.ADMIN
0.06
Mui
0.06
้ง
0.06
Activations Density 0.002%