INDEX
    Explanations

    personal pronouns and expressions of emotion or thought

    New Auto-Interp
    Negative Logits
    itia
    -0.15
    loo
    -0.15
     treff
    -0.14
    á»Ŀi
    -0.14
    NetMessage
    -0.14
    aptor
    -0.14
    cÃŃch
    -0.14
    ụ
    -0.14
    าà¸ĩ
    -0.14
    ascript
    -0.13
    POSITIVE LOGITS
     alone
    0.18
    oes
    0.15
    ister
    0.15
    alone
    0.15
    os
    0.15
     Alone
    0.15
     solo
    0.15
    res
    0.15
     co
    0.15
     we
    0.14
    Act Density 0.303%

    No Known Activations