INDEX
Explanations
The neuron activates on imperative action words (e.g. “use,” “add”) in instruction-style prompts.
New Auto-Interp
Negative Logits
baskets
-0.07
Jones
-0.07
-fold
-0.07
fold
-0.07
_pf
-0.07
beck
-0.07
_REPORT
-0.06
Deep
-0.06
ewolf
-0.06
#create
-0.06
POSITIVE LOGITS
تجه
0.07
("!0.07
fecha
0.06
_EC
0.06
perso
0.06
�
0.06
alleen
0.06
Türkçe
0.06
subsidiaries
0.06
اصيل
0.06
Activations Density 0.021%