INDEX
    Explanations

    negative expressions or rejections of ideas

    New Auto-Interp
    Negative Logits
     hel
    -0.16
    ricks
    -0.16
    noxious
    -0.15
     Contribution
    -0.14
     off
    -0.14
    IELDS
    -0.14
     fair
    -0.13
     maximal
    -0.13
     Hel
    -0.13
    bung
    -0.13
    POSITIVE LOGITS
    adan
    0.16
    ze
    0.15
    _mE
    0.15
    ستÙĩ
    0.15
    HeaderCode
    0.15
    ازÙĩ
    0.14
    á»ķ
    0.14
    ãĥĥãĤ«ãĥ¼
    0.14
     MyBase
    0.14
    æŁ±
    0.14
    Act Density 0.105%

    No Known Activations