INDEX
Explanations
dialogues and exchanges that reveal emotions and interpersonal dynamics
New Auto-Interp
Negative Logits
untime
-0.20
ilan
-0.18
astery
-0.14
ÑģÑİ
-0.14
__,__
-0.14
cried
-0.14
oop
-0.14
celik
-0.14
imi
-0.13
ÃŃrk
-0.13
POSITIVE LOGITS
reply
0.40
replied
0.36
replies
0.34
Reply
0.32
reply
0.31
Replies
0.30
Reply
0.30
answer
0.24
(reply
0.24
response
0.23
Activations Density 0.770%