INDEX
Explanations
emotional responses and reactions in dialogue
New Auto-Interp
Negative Logits
_ASSUME
-0.15
dit
-0.15
ModelIndex
-0.14
åĦĢ
-0.14
hint
-0.14
iring
-0.14
Vick
-0.14
Cutting
-0.14
оÑĩка
-0.14
ento
-0.13
POSITIVE LOGITS
immediately
0.22
immediate
0.21
instantly
0.21
reaction
0.19
ç«ĭ
0.17
instantaneous
0.17
instant
0.17
immedi
0.17
mediately
0.16
reactions
0.16
Activations Density 0.256%