INDEX
    Explanations

    phrases expressing denial or rejection

    New Auto-Interp
    Negative Logits
    Ú¯ÛĮ
    -0.16
    еÑģÑı
    -0.15
    hod
    -0.15
    arat
    -0.15
    iban
    -0.14
    fact
    -0.14
    enuine
    -0.14
    айд
    -0.14
    ós
    -0.14
    ador
    -0.14
    POSITIVE LOGITS
     understand
    0.17
     deserve
    0.17
     rightly
    0.15
     know
    0.15
     belong
    0.15
     care
    0.14
    itsu
    0.14
     suppose
    0.14
    cha
    0.14
    176
    0.14
    Act Density 0.039%

    No Known Activations