INDEX
    Explanations

    expressions of humor or sarcasm

    New Auto-Interp
    Negative Logits
    contr
    -0.17
    oken
    -0.15
    apult
    -0.15
     Slo
    -0.15
    odka
    -0.14
    clr
    -0.14
    ĮĢ
    -0.14
    anela
    -0.14
    нок
    -0.14
     Jeg
    -0.13
    POSITIVE LOGITS
    æ£
    0.17
     here
    0.17
    otics
    0.17
    bitset
    0.17
    çĴĥ
    0.16
    _HERE
    0.15
    udder
    0.15
    *this
    0.15
     здеÑģÑĮ
    0.15
    ÑĸлÑĮ
    0.14
    Act Density 0.205%

    No Known Activations