INDEX
Explanations
Avoiding typical, ordinary
This neuron detects requests instructing the model to avoid “generic” or “standard” answers.
New Auto-Interp
Negative Logits
Distinct
-0.07
.springboot
-0.07
fruit
-0.07
project
-0.06
腦
-0.06
Beautiful
-0.06
llama
-0.06
/power
-0.06
Niger
-0.06
favorite
-0.06
POSITIVE LOGITS
(%
0.07
.awtextra
0.06
ویژ
0.06
BAT
0.06
Đại
0.06
استان
0.06
frm
0.06
usz
0.06
ohio
0.06
";"
0.06
Activations Density 0.022%