INDEX
    Explanations

    affirmations or agreements in conversations

    New Auto-Interp
    Negative Logits
    deaux
    -0.19
    ntag
    -0.14
    Ø©
    -0.14
     Anc
    -0.14
     Middleton
    -0.14
    *>*
    -0.13
     minimum
    -0.13
     Bilim
    -0.13
     subs
    -0.13
    hra
    -0.13
    POSITIVE LOGITS
    ãĥ©ãĥĥãĤ¯
    0.16
    ÏĥÏĦ
    0.15
    ej
    0.15
    buz
    0.14
    anja
    0.14
    ixe
    0.14
    voices
    0.14
    thrown
    0.13
    ansa
    0.13
    iyim
    0.13
    Act Density 0.051%

    No Known Activations