INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    lj
    -0.07
    \Base
    -0.07
     expenditure
    -0.07
    cyj
    -0.07
    ificates
    -0.07
     practitioners
    -0.07
    -0.07
    .ly
    -0.07
    ariot
    -0.07
     enabling
    -0.06
    POSITIVE LOGITS
    回应
    0.09
     declaraciones
    0.09
     ಹೇಳ
    0.09
     Chevron
    0.08
     వ్యాఖ్య
    0.08
     responses
    0.08
    uggestions
    0.08
     هایی
    0.08
     משפט
    0.08
     empath
    0.08
    Act Density 0.012%

    No Known Activations