INDEX
    Explanations

    occurrences of specific keyword phrases in a foreign language, predominantly related to identity or existential states

    New Auto-Interp
    Negative Logits
    à¥įयवस
    -0.15
    bee
    -0.14
    ode
    -0.14
    /ts
    -0.14
    áže
    -0.14
     κον
    -0.14
     Ebony
    -0.13
    urr
    -0.13
     seedu
    -0.13
    obo
    -0.13
    POSITIVE LOGITS
    ìĤ¬íķŃ
    0.19
    аÑĤелÑĮно
    0.18
    eting
    0.17
     ìĤ¬íķŃ
    0.17
     remark
    0.15
    ÑĮÑı
    0.15
    rames
    0.15
    amo
    0.15
    atrix
    0.15
    GINE
    0.14
    Act Density 0.005%

    No Known Activations