INDEX
Explanations
This neuron activates on polite request phrasing (e.g. “we request,” “we would like to,” “please,” “to …”).
New Auto-Interp
Negative Logits
Gun
-0.07
Bush
-0.07
Bush
-0.07
rightfully
-0.07
unas
-0.07
anza
-0.07
AWS
-0.07
Pu
-0.06
Brennan
-0.06
igans
-0.06
POSITIVE LOGITS
<::
0.08
실
0.07
leş
0.07
möchten
0.07
:
0.07
Calgary
0.07
らい
0.07
された
0.07
같
0.07
가지
0.07
Activations Density 0.033%