INDEX
Explanations
utterances where the assistant offers help or says it can/ will assist (first‑person offers of assistance).
New Auto-Interp
Negative Logits
و
0.75
其他
0.71
ن
0.70
\
0.70
de
0.68
'
0.66
ال
0.64
รายละเอียด
0.61
閪
0.61
on
0.60
POSITIVE LOGITS
ในการ
0.87
assisting
0.86
assist
0.80
EE
0.75
ীয়তে
0.73
Assist
0.72
قیق
0.71
Assist
0.71
asistente
0.69
assistance
0.68
Activations Density 0.039%