INDEX
    Explanations

    expressions indicating decisions and actions involving people or groups

    New Auto-Interp
    Negative Logits
    ụ
    -0.14
    ãģĿãĤĮãģ¯
    -0.13
    lena
    -0.13
    eniable
    -0.13
    940
    -0.13
    öl
    -0.12
    iat
    -0.12
    æĹ¥ãģ®
    -0.12
    _echo
    -0.12
    ol
    -0.12
    POSITIVE LOGITS
     to
    0.78
    	to
    0.40
     να
    0.39
    to
    0.36
     Äijá»ĥ
    0.33
    ãĤĴ
    0.30
     zu
    0.29
    sto
    0.28
    ToUpdate
    0.28
    _to
    0.28
    Act Density 1.315%

    No Known Activations