INDEX
    Explanations

    Expressing surprise/disbelief

    New Auto-Interp
    Negative Logits
    ,加强
    -0.08
    加强
    -0.08
    entrale
    -0.08
    iate
    -0.07
    ormány
    -0.07
     satisfying
    -0.07
    appid
    -0.07
    uitary
    -0.07
    prec
    -0.07
    iados
    -0.07
    POSITIVE LOGITS
     foolish
    0.13
     bother
    0.10
     ignor
    0.09
     ignorance
    0.09
     dared
    0.09
     misguided
    0.09
     omissions
    0.09
     حیر
    0.09
     clueless
    0.09
    0.09
    Act Density 0.113%

    No Known Activations