INDEX
    Explanations

    expressions of negativity or dissatisfaction

    New Auto-Interp
    Negative Logits
    unden
    -0.16
     Really
    -0.16
     дейÑģÑĤвиÑĤелÑĮно
    -0.16
    Really
    -0.15
    ullo
    -0.15
     ÙĨسب
    -0.15
    almost
    -0.15
    ninger
    -0.15
     almost
    -0.14
    undan
    -0.14
    POSITIVE LOGITS
     flattering
    0.22
     ideal
    0.20
     conducive
    0.20
     pleasant
    0.19
     thrilled
    0.19
     appet
    0.19
     kosher
    0.19
     glamorous
    0.19
     optimal
    0.19
     savory
    0.19
    Act Density 0.114%

    No Known Activations