INDEX
    Explanations

    phrases or sentences containing greetings

    punctuation or conversational interjections that indicate dialogue or interaction

    New Auto-Interp
    Negative Logits
    İĭ
    -0.76
    ourse
    -0.74
    inction
    -0.74
    ilater
    -0.66
    arez
    -0.65
    %:
    -0.64
    arov
    -0.62
    / 
    -0.60
     lab
    -0.59
    arus
    -0.59
    POSITIVE LOGITS
     yeah
    0.89
     Wait
    0.77
     dunno
    0.74
     dear
    0.72
     yes
    0.72
     maybe
    0.71
     sorry
    0.70
     Weird
    0.70
     Butt
    0.69
     Sue
    0.69
    Act Density 0.077%

    No Known Activations