INDEX
Explanations
questions and answers
This neuron detects tokens associated with yes/no questions and answers (e.g., the words “yes,” “no,” and related instruction cues).
New Auto-Interp
Negative Logits
/ms
-0.06
/config
-0.06
swim
-0.06
Russo
-0.06
.authorization
-0.06
.exam
-0.06
xi
-0.06
Г
-0.06
eval
-0.06
spouse
-0.06
POSITIVE LOGITS
只
0.07
جدا
0.07
σκ
0.07
komen
0.07
_lazy
0.06
landlords
0.06
การจ
0.06
พน
0.06
*:
0.06
($_
0.06
Activations Density 0.073%