INDEX
    Explanations

    expressions of gratitude and recognition

    New Auto-Interp
    Negative Logits
     purse
    -0.14
    elor
    -0.14
    oulder
    -0.14
    ordon
    -0.14
     sympathy
    -0.13
    ìĶ
    -0.13
    tanggal
    -0.13
    zym
    -0.13
    enas
    -0.13
    зÑĥ
    -0.13
    POSITIVE LOGITS
     privilege
    0.17
    uppe
    0.15
    ably
    0.15
    оÑģÑĮ
    0.15
     privileged
    0.15
    ÛĮدÛĮ
    0.15
     opportunity
    0.14
    εÏħ
    0.14
     Priv
    0.14
     priv
    0.14
    Act Density 0.034%

    No Known Activations