INDEX
Explanations
The neuron detects self-referential/reflexive language—tokens and phrases that refer back to the subject itself (e.g., "self", "itself", "talking to itself", "self-...").
New Auto-Interp
Negative Logits
mirrors
-0.07
Since
-0.07
コ
-0.06
Since
-0.06
variants
-0.06
customers
-0.06
itation
-0.06
"There
-0.06
do
-0.06
۱۴
-0.06
POSITIVE LOGITS
ungkin
0.07
َح
0.07
�
0.06
#error
0.06
bitmap
0.06
unsett
0.06
Pathfinder
0.06
Tot
0.06
Jimmy
0.06
olduğ
0.06
Activations Density 0.234%