INDEX
Explanations
Offering help
The neuron detects the assistant’s self-referential help-offer phrasing (e.g. “I’ll do my best to help”).
New Auto-Interp
Negative Logits
bz
-0.08
/files
-0.08
경제
-0.06
ोर
-0.06
gar
-0.06
сф
-0.06
ασ
-0.06
function
-0.06
.and
-0.06
astr
-0.06
POSITIVE LOGITS
기도
0.07
Flavor
0.07
bruk
0.06
ания
0.06
_leg
0.06
ерк
0.06
vým
0.06
<Player
0.06
TexParameter
0.06
FromFile
0.06
Activations Density 0.033%