INDEX
Explanations
directions/instructions
The neuron is specifically activated by occurrences of the word “prompt.”
New Auto-Interp
Negative Logits
ар
-0.07
Preston
-0.07
iên
-0.07
arie
-0.07
Facilities
-0.07
Vancouver
-0.07
Has
-0.07
facilities
-0.07
obligation
-0.06
Tom
-0.06
POSITIVE LOGITS
\xd
0.06
}));↵
0.06
_TRY
0.06
_UNS
0.06
=[↵
0.06
ju
0.06
0.06
]=$
0.06
بسي
0.06
@Resource
0.06
Activations Density 0.002%