INDEX
    Explanations

    conversational exchanges and expressions of emotion

    New Auto-Interp
    Negative Logits
    ilde
    -0.18
    obao
    -0.15
     Fucking
    -0.15
    shall
    -0.14
    ihad
    -0.14
    adÃŃ
    -0.14
    imas
    -0.14
    usty
    -0.14
    £o
    -0.14
     freaking
    -0.13
    POSITIVE LOGITS
    0.22
     dat
    0.19
    akin
    0.19
     kin
    0.19
     git
    0.19
    izin
    0.19
     dere
    0.19
     sich
    0.18
     kinda
    0.18
    è¾°
    0.18
    Act Density 0.353%

    No Known Activations