INDEX
Explanations
The neuron detects system‐ or user‐provided role‐playing directives—phrases instructing the assistant how to stay “in character,” speak, or behave.
New Auto-Interp
Negative Logits
plastics
-0.07
-covered
-0.06
trusting
-0.06
rampant
-0.06
lse
-0.06
Part
-0.06
hunts
-0.06
_partition
-0.06
agedList
-0.06
Notice
-0.06
POSITIVE LOGITS
,right
0.07
waterproof
0.06
معت
0.06
Пер
0.06
setScale
0.06
süt
0.06
EF
0.06
:'+
0.06
نين
0.06
_af
0.06
Activations Density 0.027%