INDEX
Explanations
This neuron primarily activates on placeholder entity tokens (e.g. “NAME,” “NAME_2,” etc.) rather than ordinary words.
New Auto-Interp
Negative Logits
outings
-0.07
PRS
-0.07
Pink
-0.06
بس
-0.06
Twenty
-0.06
�
-0.06
Gavin
-0.06
%).↵↵
-0.06
matchups
-0.06
peanuts
-0.06
POSITIVE LOGITS
converter
0.06
_agent
0.06
.dll
0.06
_DATA
0.06
Undo
0.06
forgiveness
0.06
uvol
0.06
overrides
0.06
Cmd
0.06
uat
0.06
Activations Density 0.011%