INDEX
Explanations
The neuron activates on instances of the word “help,” i.e. when the text is requesting or referring to assistance.
New Auto-Interp
Negative Logits
▼
-0.07
groupName
-0.07
плани
-0.06
/met
-0.06
dosy
-0.06
%s
-0.06
Honor
-0.06
beurette
-0.06
_choose
-0.06
ของ
-0.06
POSITIVE LOGITS
Aeros
0.08
131
0.07
(bounds
0.07
疫
0.07
orea
0.06
_bid
0.06
ông
0.06
(topic
0.06
TRACT
0.06
ydro
0.06
Activations Density 0.007%