INDEX
Explanations
This neuron responds to the appearance of the verb “features” (and its close variants) in descriptive sentences.
New Auto-Interp
Negative Logits
clamp
-0.07
ANSW
-0.07
global
-0.07
(zip
-0.07
yyn
-0.06
bind
-0.06
takes
-0.06
suppress
-0.06
giveaways
-0.06
escape
-0.06
POSITIVE LOGITS
featuring
0.10
features
0.08
Featuring
0.07
aysia
0.07
porto
0.07
uest
0.07
�
0.07
stein
0.06
perfection
0.06
Features
0.06
Activations Density 0.012%