INDEX
Explanations
The neuron detects the instruction words “question” and “answers” in the prompt asking whether a sentence answers a given question.
New Auto-Interp
Negative Logits
oste
-0.06
ková
-0.06
Robot
-0.06
müdür
-0.06
Notice
-0.06
Cel
-0.06
감독
-0.06
원을
-0.06
Sour
-0.06
Notice
-0.06
POSITIVE LOGITS
udo
0.07
eternity
0.06
background
0.06
เพลง
0.06
convoy
0.06
systemFontOfSize
0.06
patial
0.06
acje
0.06
一起
0.06
_ajax
0.05
Activations Density 0.001%