INDEX
    Explanations

    affirmative conversational responses

    New Auto-Interp
    Negative Logits
    тернет
    0.75
    ującego
    0.69
     hendak
    0.66
    イス
    0.65
     ప్ర
    0.65
    enes
    0.64
    ಾರ
    0.64
    ující
    0.64
    es
    0.63
    ogram
    0.63
    POSITIVE LOGITS
     haha
    1.12
     true
    1.00
     thats
    1.00
     noticed
    0.94
     but
    0.94
     apt
    0.93
     understandable
    0.93
     heard
    0.92
     sorry
    0.91
     guilt
    0.90
    Act Density 0.015%

    No Known Activations