INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     gratification
    -0.08
     hygiene
    -0.08
    成绩
    -0.08
    scp
    -0.07
    zf
    -0.07
     everybody
    -0.07
    afety
    -0.07
    -0.07
    Insn
    -0.07
     Dich
    -0.07
    POSITIVE LOGITS
    ને
    0.08
    0.07
     Pascal
    0.07
    Fest
    0.07
     brilh
    0.07
     dispers
    0.07
     Plymouth
    0.07
     μή
    0.07
     satisfactor
    0.07
     slows
    0.07
    Act Density 0.013%

    No Known Activations