INDEX
    Explanations

    words related to responses or replies in the context of conversations

    instances of dialogue or spoken responses

    New Auto-Interp
    Negative Logits
    ctors
    -0.77
    olin
    -0.73
    cipled
    -0.71
    bons
    -0.71
    icipated
    -0.69
    dar
    -0.68
    wed
    -0.68
    prus
    -0.67
    gone
    -0.65
    ciples
    -0.64
    POSITIVE LOGITS
     angrily
    0.88
     sarcast
    0.88
     thereto
    0.83
     favorably
    0.82
     affirm
    0.80
    reply
    0.77
     indign
    0.73
     harshly
    0.72
     whine
    0.72
     replies
    0.71
    Act Density 0.028%

    No Known Activations