INDEX
Explanations
questions
It flags the assistant’s stock clarification prompt—phrases like “Do you have any specific questions…?” asking the user for more details.
New Auto-Interp
Negative Logits
807
-0.07
(['/
-0.06
Numbers
-0.06
,next
-0.06
atak
-0.06
̣
-0.06
leftist
-0.06
edef
-0.06
chiefly
-0.06
REUTERS
-0.06
POSITIVE LOGITS
grupos
0.07
connect
0.06
Ne
0.06
among
0.06
invert
0.06
Spr
0.06
keit
0.06
düz
0.06
darkness
0.06
두
0.06
Activations Density 0.076%