INDEX
    Explanations

    dialogues and exchanges that reveal emotions and interpersonal dynamics

    New Auto-Interp
    Negative Logits
    untime
    -0.20
    ilan
    -0.18
    astery
    -0.14
    ÑģÑİ
    -0.14
    __,__
    -0.14
     cried
    -0.14
    oop
    -0.14
    celik
    -0.14
    imi
    -0.13
    ÃŃrk
    -0.13
    POSITIVE LOGITS
     reply
    0.40
     replied
    0.36
     replies
    0.34
     Reply
    0.32
    reply
    0.31
     Replies
    0.30
    Reply
    0.30
     answer
    0.24
    (reply
    0.24
     response
    0.23
    Act Density 0.770%

    No Known Activations