INDEX
    Explanations

    instances of agreement or confirmation in dialogue

    New Auto-Interp
    Negative Logits
    rie
    -0.16
    orie
    -0.16
    oto
    -0.15
     McCart
    -0.14
    оÑĤе
    -0.14
    ot
    -0.14
     sed
    -0.13
    jie
    -0.13
     cascade
    -0.13
    /cache
    -0.13
    POSITIVE LOGITS
    gebn
    0.16
    lied
    0.15
    steder
    0.15
    endi
    0.15
    ÙİÙī
    0.14
    URLException
    0.14
    éĨ
    0.14
    enty
    0.14
    âĸį
    0.14
    729
    0.14
    Act Density 0.028%

    No Known Activations