INDEX
    Explanations

    phrases that reflect conflicting views or hypocrisy in discussions

    New Auto-Interp
    Negative Logits
    ÎŃÏģγ
    -0.16
    ayar
    -0.15
    lik
    -0.15
    ensibly
    -0.15
    MeasureSpec
    -0.14
    flower
    -0.14
     dro
    -0.14
    ãĥªãĥ¼ãĤº
    -0.14
    ):?>↵
    -0.14
     ><?
    -0.14
    POSITIVE LOGITS
    isz
    0.15
    mac
    0.15
    antes
    0.15
    ascar
    0.14
    udic
    0.14
     Deutsch
    0.14
    DOT
    0.14
    adow
    0.14
     Äijỡ
    0.14
     Zimmer
    0.14
    Act Density 0.114%

    No Known Activations